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(54) Title: GENES AMPLIFIED IN CANCER CELLS 
(57) Abstract 

New methods are disclosed for delecting cancer associated 
genes, and obtahiing corresponding cDNA sequences. The 
methods involve supplyuig RNA pcepariBtions from control cells, 
and from a plurality of different cancer cells that share a 
diqplicated or deleted gene in the same region di a chiompsome. 
Amplified cI^A copies are displayed, and diea selected based 
on differences in abundance of RNA between preparations. 
Optional additional screening steps involve surveying panels of 
cancer cells using the cDNA for RNA overabundance with or 
widiout gene duplication. The identified genes can be used 
in tirni to develop materials and techniques for diagnosing and 
treating die underlying cancer. Four novel genes associated widi 
cancer have been identified. In at least about 60 % of the breast 
cancer cell lines tested^ RNA hybridizing with the cDNAs were 
substantially more abundant dian in normal cells. Most of the 
cell Imes also showed a duplication of die corresponding gene, 
which probably contributed to the increased level of RNA in the 
ceil. However, for each the four genes, diere were some cell 
Unes which had RNA overabundance without gei» duplication. 
Ihb suggests d»t the gene iHoduct is sufficiently important 
to the cancer process that cells will use several alternative 
mechanisms to achieve increased expression. 
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Gemes Amplified im Cancer Cells 

PRIQRmr CLAIM 

5 This application claims the priority t)enefit of the foflowing U.S. Patent applications: 

60/015.167, filed April 9. 1996; 60/019.202, filed. June 6. 1996; 00678,280, fHed July 10. 1996, For 
i>urposes of prosecution in the U.S.. the afbrementioned applications are heretyy incorporated herein 
tjy reference In their entirety. 

10 TECHNICAL FlELP 

The present invention relates generally to the field of human genetics. More spectficaily, it 
relates to the identification of novel genes associated with overabundance of RNA in human cancer 
such as breast cancer It pertains especially to those geries and the products thereof which may be 
1 5 important in diagnosis and treatment 

Packqrqunpqf the Invention 

Cancer is a heterogeneous disease. Itmanifestsitself In a wide variety of tissue sites, with 
20 diffierent degrees of de-diffienentlation, invasiveness, and aggressiveness. Some ffonms of cancer 

are responsive to traditional modes of therapy, but many are not For most common cancers, there 

is a pressing need to Improve ttie arsenal of therapies avaOable to provide more precise and more 

effective treatment in a less invasive way. 

As an example, breast cancer has an unsatisfactory morbidity and mortality, despite 
25 presentiy available forms of medical Intervention. Traditional dtnical initiatives are focused on early 

diagnosis, followed by surgery and chemotherapy. Such interventions are of limited success, 

partlcufarly In patients where tiie tumor has undergone metastasis. 

The heterogeneous nature of cancer arises because different cancer cells achieve their 

growth and pathological properties by different phenotypic alterations. Alteration of gene 
30 expression is intimately related to ttie uncontrolled growtii and de-differentiation that are hallmarks 

of cancer. Certain similar phenotypic alterations in turn may have a different genetic base in 

diffierent tumors. Yet. the ritmiber of genes central b the malignant process must t>e a finite one. 

Accordingly, new phamnaceuticals ttiat are tailored to specific genetic alterations in an individual 

tumor may be more effective. 
35 There are two types of altered gene expression that take place, together or independentiy, 

in different cancer cells (reviewed by Bishop). The first type is the decreased expression of 

recessive genes, known as tumor suppresser genes, tiiat apparentiy act to prevent malignant 

growth. The second type is the increased expression of dominant genes, such as oncogenes, that 
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act to promote malignant growth, or to provide some other phenotype critical for malignancy. Thus, 
alteration in the expression of either type of gene is a potential diagnostic indicator. Furthermore, a 
treatment strategy might seek to reinstate the expression of suppressor genes, or reduce the 
expression of dominant genes. The present invention is directed to identifying genes of either type, 

5 particularly those of the second type. 

The most frequently studied mechanism for gene overexpression in cancer cells is 
sometimes refenned to as amplification. This is a process whereby the gene is duplicated within the 
chromosomes of the ancestral cell into multiple copies. The process involves unscheduled 
replications of the region of the chromosome comprising the gene, followed by recombination of the 

10 replicated segments back Into the chronnosome (Alitalo et al.). As a result, 50 or more copies of 
the gene may be produced. The duplicated region is sometimes referred to as an "amplicon". The 
level of expression of the gene (that is, the amount of messenger RNA produced) escalates in the 
transformed cell in the same proportion as the number of copies of the gene that are made (Alitalo 
etaL). 

15 Several human oncogenes have been described, some of whteh are duplicated, for 

example. In a significant proportion of breast tumors. A prototype is the erbBZ gene (also known 
as HER-2//ie£i), which encodes a 185 kDa memtirane growth factor receptor homologous to the 
epidermal growth fiactor receptor. erbBl is duplicated in 61 of 283 tumors (22%) tested in a recent 
survey (Adnane et aL). Other oncogenes duplicated in breast cancer are the bek gene, duplicated 

20 in 34 out of 286 (12%); the fig gene, duplicated in 37 out of 297 (12%). the myo gene, duplicated in 
43 out of 275 (16%) (Adnane et al.). 

Work with other oncogenes, partksularty those described for neuroblastoma, suggested that 
gene duplication of the proto-oncogene was an event invoh^ed in the more malignant fonms of 
cancer, and could act as a predictor of clinical outcome (reviewed by Schwab et al. and Alitalo et 

25 aL). In tmast cancer, duplteation of the erbBZ gene has been reported as correlating both with 
reoccurrence of the disease and decreased survival times (Slamon et al.). There is some evidence 
that e/bB2 helps identify tumors that are responsive to adjuvant chemotherapy with 
cyclophosphamide, doxorubtoin, and fluorouracil (Muss et al.). 

It is clear tttat only a proportton of the genes that can undergo gene duplication in cancer 

30 have been identified. First, chroniosome abnormalities, such as double minute (DM) chromosomes 
and homogeneously stelned regions (HSRs), are abundant in cancer cells. HSRs are 
chromosomal regions that appear in karyotype analysis with Intermediate density Glemsa stelning 
throughout their length, rather than with the normal pattern of alternating dark and light bands. 
They con^espond to multiple gene repeats. HSRs are particulariy abundant in breast cancers, 

35 showing up in 60-65% of turners surveyed (Dutrillaux et aL. Zafirani et al.). When such rogtons are 
diecked by in situ hybridization witi) probes for any of 16 known human oncogenes, including 
erbB2 and myc, only a proportion of tumors show any hybridization to HSR regions. Furthermore, 
only a proportion of the HSRs witiiin each karyotype are Implicated. 
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Second, comparative genomic hybridization (CGH) has revealed the presence of copy 
mmbet Increases in tumors, even in chromosomal regions outside of HSRs. CGH is a new 
method in which whole chromosome spreads are stained simultaneously with DNA fragments from 
normal cells and from cancer cells, using two different fluorochromes. The images are 
5 computer-processed for the fluorescence ratio, revealing chromosomal regions that have 
undergone amplification or deletion in the cancer cells (Kalliontemi et al. 1992). This method was 
recently applied to 15 breast cancer cell lines (Kallioniiami et al. 1994). DNA sequence copy 
number increases were detected m all 23 chromosome pairs. 

Cloning the genes that undergo duplication in cancer is a fbnnldable challenge. In one 
10 approach, human oncogenes have been identified by hybridizing with probes for other known 
growth-promoting genes, particulariy known oncogenes in other species. For example, the erbB2 
gene was identified using a probe from a chemically induced rat neuroglioblastoma (Slamon et al.). 
Genes with novel sequences and functions will evade this type of search. In another approach, 
genes may be cloned from an area identified as containing a duplicated region by CGH method. 
15 Since CGH is able to indicate only the approximate chromosomal regbn of duplicated genes, an 
extensive amount of experimentation is required to walk through the entire region and identify the 
particular gene involved. 

Genes may also be overexpressed in cancer without being duplicated. Methods that rely 
on identification from genetic abnormalities necessarily bypass such genes. Increased expressfon 
20 can come about through a higher level of transcription of tiie gene; for example, by up-regulation of 
the promoter or substitution witti an alternative promoter. It can also occur If the transcription 
product is able to persist longer in the cell; for example, by increasing the resistance to cytoplasmic 
RNase or by redudng the level of such cytoplasmte enzymes. Two examples are the epidermal 
growtti factor receptor, overexpressed in 45% of breast cancer tumors (Klijn et al ). and the IGF~1 
25 receptor, overexpressed in 50-93% of breast cancer tumors (Bems et al.). In almost all cases, the 
overexpression of each of ttiese receptors is by a mechanism other tiian gene duplication. 

One way of examining overexpression at the messenger RNA level is by subti-active 
hybridization. It involves producing positive and negative cDNA strands from two RNA 
preparations, and looking for cDNA which is not completely hybridized by the opposing preparation. 
30 This is a laborious procedure which has distinct limitetions in cancer research. In particular, since 
each subtraction involves cDNA from only two cell populations at a time. It is sensitive to individual 
phenotypic differences due not just to ttie presence of cancer, but also through natural metebolic 
variations. 

Another way of examining overexpression at the messenger RNA level is by differential 
35 display (Liang et al. 1992a). in tills technique, cDNA is prepared from only a subpopulation of each 
RNA preparation, and expanded via the polymerase chain reaction using primers of particular 
specificity. Similar subpopulations are compared across several RNA preparations by gel 
autoradiography for expression differences. In order to survey the RNA preparations entirely, ttie 
assay is repeated with a comprehensive set of PCR primers. The screening strategy more 
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effectively Includes multiple positive and negative control samples (Sunday et al.). The method has 
recently been applied to breast cancer cell fines, and highlights a number of expression differences 
(Uang et al. 1992b; Chen et al.. McKenzie et al.. Watson et al. 1994 & 1996. Kocher et al ). By 
excising the corresponding region of the separating gel. it is possible to recover and sequence the 
5 cDNA. 

Despite the advancement provided by differential display, problems remain In teniis of 
applying it in the search for new cancer genes. First, because this is a test for RNA levels, any 
phenotypic difference between cell lines constitute part of the recovered set. leading to a large 
proportion of 'false positive" identifications . It has been found that cDNA for mitochondrial genes 

10 constitute a large proportion of the differentially expressed bands, and It consumes substantial 
resources to recover the sample and obtain a partial sequence in order to eliminate them. Second, 
false positive identifications are made for reasons attributed to multiple cDNA species and 
competition for tt» PCR primers by RNA species of different abundance (Debouck). Third. 
dHierential display highBghts high copy number mRNAs and shorter mRNAs (Bertioli et al.. 

15 Yeatman et al.) . and may ttierefbre miss crtttcal cancer-assodated transcripts when used as a 
sun/ey technique. Fourtti. a number of adjustments are made to gene expression levels when a 
cell undergoes malignant transformation or cultured in vitro. Most of ttnese adjustments are 
secondary, and not part of the transformation process. Thus, even when a novel sequence is 
obtained from the differential display, it is far from certain that ttie corresponding gene is at ttie root 

20 of the disease process. 

An eariy step in developing gene-specific ttieiapeutic approaches is the identification of 
genes that are more central to malignant transformation or ttie persistence of ttie malignant 
pheno^pe. 

25 n«ci QSURE OF THE INVENTION 

It is an objective of ttiis Invention to provide a metiiod for Identifying and characterizing 
genes and gene products which are duplicated or associated w«h overabundant RNA in cancer 
cells. The mettiod can be used for any type of cancer, providing a plurality of cell populations or 

30 cell lines of ttie type of cancer are available, in conjunction with a suitable control cell population. 
The method is highly effective in identifying genes and gene products ttiat are intimately related to 
malignant transfomiation or maintenance of ttie malignant properties of ttie cancer cells. 

An important derivative of applying «ie mettiod is ttie selection and retrieval of cDNA and 
cDNA fragments corresponding to tiie cancer-assodated gene. These fragments can be used 

35 inter alia to determine tiie nucleotide sequence of ttie gene and mRNA, the amino add sequence of 
any encoded protein, or to retrieve from a cDNA or genomic library additional polynucleotides 
related to ttie gene or its transcripts. Since tiie genes are typically Involved In ttie malignant 
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process of the cell, the polynucleotides, polypeptides, and antibodies derived by using this method 
can In turn be used to design or screen important diagnostic reagents and therapeutic compounds. 

Another objective of this Invention to provide isolated pdynucfeotides. polypeptides, and 
antibodies derived from four novel genes which are associated with several different types of cancer, 
5 including breast cancer The genes are designated CH1-9a11-2. .CH8-2a13-1. CH13-2a12-1. and 
CH14-2a16-1. These designations refer to both strands of the cDNA and fragments thereof, and to 
the respective corresponding messenger RNA. including splice variants, allelic variants, and 
fragments of any of these forms. These genes show FINA overabundance In a majority of cancer cell 
lines tested, A majority of the cells showing RNA overabundance also have duplication of the 
10 corresponding gene. Another object of this invention is to provide materials and methods based on 
these polynucleotides, polypeptides, and antibodies for use in the diagnosis and tineatment of cancer, 
particutariy breast cancer. 

Accordrigly, one embodiment of this invention is an isolated polynucleoticte comprising a 
linear sequence contained in a polynucleotide selected from the group consisting of CH1>9a11-2, 
15 CH8-2a13-1, CH13-2a12-1, and CH14-2a16-1. The linear sequence is contained in a duplicated 
gene or overabundant RNA in cancerous ceils. The RNA may be overabundant due to gene 
duplication, increased RNA transcription or processing, increased RNA persistence, any combination 
ttaereof, or by any other mechanism, in a proportion of breast cancer cells. Preferably, the RNA is 
overabundant in at least about 20% of a representative panel of breast cancer cell Hnes, such as tiie 
20 panels listed herein; more preferably, it is overabundant in at least dboxA 40% of the panel; even more 
preferably, it is overabundant In at teast 60% or more of the pane!. Preferably, the RNA Is 
overabundant in at least about 5% of spontaneously occurring breast cancer tumors; nrxxe preferably, 
ft is overabundant in at least about 10% of sudi tumors; more preferably, it is overabundant in at least 
about 20% of such tumors; more preferably. It Is overabundant in at teast about 30% of such tumors; 
25 even more preferably, it is overabundant in at least about 50% of such tumors. 

Preferably, a sequence of at least 10 nucleotides is essentially identical k>etween ttie isolated 
polynucleotide of the invention and a cDNAfrom CH1-9a11-2. CH8-2a13-1, CH13-2a12-1, and CH14- 
2a16-1; more preferably, a sequence of at least about 15 nucleotides is essentially Identical; more 
preferably, a sequence of at least about 20 nucleotides e essentially identical; more preferably, a 
30 sequence of at least about 30 nucleotides Is essentially identical; more preferably, a sequence of at 
least about 40 nucleotides is essentially identical; even more preferably, a sequence of at least at)out 
70 nucleotides is essentially identical; still more preferably, a sequence of about 100 nucleotides or 
more is essentially identical. A further embodiment of this invention is an isolated polynucleotide 
oonrtprising a linear sequence essentially identical to a sequence selected from the group consisting of 
35 SEQ. ID NO:15, SEQ. ID NO:18. SEQ. ID NO:21, SEQ. ID NO:23. SEQ. ID NO:26, SEQ. ID NO:29. 
SEQ. ID NO:31.. SEQ. ID NO:33, md SEQ. ID N0:3S. These embodiments hidude an isolated 
polynudeotide which is a DNA polynucleotide, an RNA polynucleotide, a polynudeotide probe, or a 
polynucleotide primer. 
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This Invention also providds an isolated polypeptide comprising a sequence of amino acids 
essenttaity Identical to the polypeptide encoded by or translated from a polynucleotide selected from 
the group consisting of CH1.9a11-2, CH8-2a13-1, CH13-2a12*1. and CH14-2a16-1. Preferably, a 
sequence of at least about 5 amino acids is essenttaity tdenticat between the polypeptide of this 
5 invention and that encoded by the polynucleotide; more preferably, a sequence of at least about 10 
amino acids is essentially identical; more preferably, a sequence of at least 15 amino acids Is 
essentially identical; even more preferably, a sequence of at feast 20 ammo acids is essentially 
identical; still more preferably, a sequence of about 30 amino adds or more is essentially identical. 
Preferably, the polypeptide comprises a linear sequence of at least 15 amino acids essentially 
10 identical to a sequence encoded by said polynucleotide. Another embodiment of this invention is a 
polypeptide comprising a linear sequence essentially identical to a sequence selected from ttie group 
consisting of SEQ. ID Nai7, SEQ. ID NO:20, SEQ. ID NO:25, SEQ. ID NO:28, SEQ. ID NO:30, 
SEQ. ID NO:32. SEQ. ID NO:34: and SEQ. ID NO:37. 

A further embodiment of tiiis invention is an antibody specific for a polypeptide embodied in 
15 this invention. This encompasses both monoclonal and isolated polyclonal antibodies. 

A further embodiment of this invention is a method of using the polynucleotides of this 
invention for detecting or measuring gene duplication in cancerous cells, especially but not limited to 
breast cancer cells, comprising the steps of reacting DMA contained in a clinical sample with a 
reagent comprising the polynucleotide, said clinicat sample having been obtained from an individual 
20 suspected of having cancerous cells; and comparing the amount of complexes fonned between the 
reagent and the DMA in tiie clinical sanr^le with the anx)unt of comple)^ fornied between ttie 
reagent and DNA in a control sample. 

A further embodiment is a method of using the polynucleotides of this invention for detecting 
or measuring overatHjndanoe of RNA in cancerous cells, especially but not limited to breast cancer 
25 cells, comprising the steps of reacting RNA contained in a clinicai sample with a reagent comprisbig 
tiie potynudeotide, said clinical sample having been obtained from an individual suspected of having 
cancerous cells; and comparing the amount of complexes formed between the reagent and the RNA 
in the dinical sample with the anrvount of complexes formed between ttie reagent and RNA in a control 
sample. 

30 Another embodiment of this invention is a diagnostic Idt for detecting or measuring gene 

duplication or RNA overabundance in cells conteined in an individual as manifest in a dinical sample, 
comprising a reagent and a buffer in suiteble packaging, wherein the reagent comprises a 
polynucleotide of this invention. 

Anotiier embodiment of ttiis invention is a mettKx;! of using a polypeptide of this invention for 

35 detecting or measuring spedfic antibodies in a dinical sampfe, comprising ttie steps of reacting 
antibodtes contained in the dinical sample with a reagent comprising the polypeptide, said dinical 
sample having been obteined from an individual suspected of having cancerous cells, espedally but 
not limited to breast cancer cells; and comparing the amount of complexes formed between tiie 
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reagent and the anblx)dies In the clinical sample with the annount of complexes formed t)etween the 
reagent and antitxxlies in a control sample. 

Another emtx)din)ent of this invention is a method of using an antibody of this invention for 
detecting or measuring altered protein expression in a clinical sample, comprising the steps of 

5 reacting a polypeptide contained in the clinical sample with a reagent comprising the antibody, said 
clinical sample having t>een obtained from an individual suspected of having cancerous cells, 
especially but not limited to breast cancer cells; and comparing the amount of complexes fonned 
between the reagent and the polypeptide in the clinical sample with the anrK)unt of complexes formed 
between the reagent and a polypeptide in a control sample. Further embodiments of this invention 

10 are diagnostic kits for detecting or measuring a polypeptide or antibody present in a clinical sample, 
comprising a reagent and a buflier in suitable packaging, wherein the reagent respectively comprises 
either an antibody or a polypeptide of this Invention. 

Yet another embodiment of this inventk>n is a host cell transfeded by a polynucleotide of this 
inventksn. A further embodiment of this invention is a method for using a polynucleotide for screening 

15 a pharmaceutical candMate, comprising the steps of separating progeny of the transfected host celt 
into a first group and a second group; treating the first group of cells vinth the pharmaceutical 
candidate; not treating the second group of cells with the phannaceutical candidate; and comparing 
the phenotype of the treated cells with that of the untreated cells. 

This inventk)n also embodies a pharmaceutk^al preparation for use in cancer therapy. 

20 comprising a polynudeotkle or polypeptide emtxxiied t}y this inventfon^ said preparation being 
capable of rsdudng the pathok)gy of cancerous cells, especially for but not linvted to breast cancer 
cells. Further embodiments of this inventton are methods for treating an individual bearing cancerous 
cells, such as breast cancer cells, comprising administering any of the aforementfoned 
pharmaceutical preparaticxis. 

25 Stfll another embodiment of this invention is a pharmaceutical preparation or active vaccine 

comprising a polypeptide embodied by this invention m an immunogenk: form and a pharmaceuticaHy 
compatible excipient A further embodiment is a method for treatment of cancer, especially but not 
limited to breast cancer, either prophylactically or after cancerous cells are present in an individual 
being treated, comprising administration of the aforementioned pharmaceutical preparation. 

30 Another series of emt>odlments of this invention relate to methods for obtaining cDNA 

corresponding to a gene associated with cancer, comprising tiie steps of. a) supplying an RNA 
preparation from uncultured control cells; b) supplying RNA preparations from at least two different 
cancer cells; c) displaying cDNA corresponding to the RNA preparattons of step a) and step b) 
such that different cDNA corresponding to different RNA in each preparation are displayed 

35 separately; d) selecting cONA corresponding to RNA that is present in greater abundance in ttie 
cancer cells of step b) relative to the control cells of step a); e) supplying a digested DNA 
preparation from control cells; f) supplying digested DNA preparations from at least two different 
cancer celts; g) hybridizing the cDNA of step d) with ttie digested DNA preparations of step e) and 
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Step f); and h) further selecting cONA from the cDNA of step d) corresponding to genes that are 
duplicated in the cancer cells of step f) relative to the control cells of step e). 

One or more enhancements may optionally be included in the methods of this invention, 
including the following: 

5 1 . Cancer cells are preferably used for step b) that share a duplicated gene in the same 

region of a chromosome. If desired, the practitioner may test cancer cells beforehand 
to detect the duplication or deletion of chromosome regions; or cancer cell lines may 
be used that have already been characterized in this respect. 

2. A higher plurality of cancer cells are preferably used to provide DNA for step b). step f), 
i 0 or preferably both step b) and step f)- The use of three cancer cells is prefenred over 

two; the use of four cancer cells is more prefened. about five cancer cells is still more 
preferred, about eight cancer cells is even more preferred. The cDIMA of each cancer 
cell populatton is displayed or hybridized separately, in accordance with the method. 

3. A higher plurality of control cells are preferably used to provide DNA for step a), step 
1 5 e), or preferably both step a) and step e). The use of two control cell populations is 

preferred; the use of three or more is even more preferred. Both proliferating and non- 
proliferating populations are preferably used, If available. 

4. The control cells are preferably supplied fresh from a tissue source, and are not 
cultured or transfonmed into a cell line. This is Increasingly important when the control 

20 cell populations used in step a) is only one or two in number. Freshly obtained cancer 

cells may also be used as an alternative to cancer cell lines, although this is less 
critical. 

5. An additional screening step is preferably conducted in which the cDNA corresponding 
to the putative cancer-associated gene is additionally hybridized with a digested 

25 mitochondrial DNA preparation, to eliminate mitochondrial genes. This screening step 

may be conducted before, between, subsequent to. or simultaneously vM\ the other 
screening steps of the method. 

6. An additional screening step is preferably conducted in which RNA is supplied from a 
plurality of cancer cells» and one or preferably more control cell populations; the RNA is 

30 contacted with cDNA con^sponding to the putative cancer-associated gene under 

conditions that permit formation of a stable duplex, and cDNA Is selected 
corresponding to RNA that is present in greater abundance in a proportion of the 
cancer cells relative to the control cells. Preferably, the plurality of cancer cells is a 
panel of at least five, preferably at least ten cells. Preferably at feast ttiree. more 

35 preferably at least five of ttie cancer cells show greater abundance of RNA. Preferably 

at feast one and preferably more of ttie cancer cells shows a greater abundance of 
RNA compared witit control cells, but does not show duplication of the conresponding 
gene In step h) of the mettiod. 
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Other embodiments of the invention are methods for obtaining cDNA corresponding to a 
gene that is deleted or underexpressed in cancer, comprteing the steps of: a) supplying an RNA 
preparation from control cells; b) supplying RNA preparations from at least two different cancer 
cells that share a deleted gene in the same region of a chromosome; c) displaying cDNA 

5 conBsponding to the RNA preparations of step a) and step b) such that different cDNA 
corresponding to different RNA in each preparation are displayed separately; and d) selecting 
cDNA corresponding to RNA that is present in bwer abundance in the cancer cells of step b) 
relative to the control cells of step a). Such methods typically comprise the following further steps: 
e) supplying a digested DNA preparation from control cells; f) supplying digested DNA 

10 preparations from at least two different cancer cells; g) hybridizing the cDNA of step d) with the 
digested DNA preparations of step e) and step f); and h) further selecting cDNA from the cDNA of 
step d) corresponding to a gene that is deleted In the cancer cells of step f) relative to the control 
cells of step e). Such nnethods for identifying deleted or underexpressed genes may also comprise 
enhancements such as those described above. 

15 Additional embodiments of this invention are methods for characterizing cancer genes, 

comprising obtaining cDNA corresponding to a cancer-associated gene according to a nr>ethod of 
this invention, particulariy those highlighted above, and then sequencing the cDNA. Alternatively or 
in addition, the cDNA may be used to rescue additional polynucleotides corresponding to a cancer- 
associated gene from an mRNA preparation, or a cDNA or genomic DNA library. 

20 Additional embodiments of this Invention are methods for screening candidate drugs for 

cancer treatment, comprising obtaining cDNA corresponding to a gene that is duplicated, 
overexpressed, deleted, or underexpressed in cancer, and comparing the effect of the candidate 
drug on a cell genetically altered with the cDNA or fragment thereof with the effect on a cell not 
genetically altered. 

25 Various embodiments of this invention may be employed in pursuit of any form of cancer 

for which suitable tissue sources are available. Cancers of particular interest include lung cancer. 
gliot>lastoma, pancreatic cancer, colon cancer, prostate cancer, hepaton^, myetoma, and breast 
cancer. 

30 BRIEF DESCRIPTION OF THE PRAVWNGS 

Figure f is a half-tone reproduction of an autoradlogram of a differential display experiment, in which 
radiolabeled d3NA corresponding to a subset of total messenger RNA in difbrent cells are compared. 
This is used to select cDNA corresponding to particular RNA that are overabundant in breast cancer. 

35 

F^um 2 is a half-tone reproduction of an autoradlogram of electrophoresed DNA dgests from a 
panel of breast cancer cell lines probed with a CH8-2a13-1 insert (Panel A) or a loading control (Panel 
B). 
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FlgufB 3 is a half-tone iBproduction of an autoradiogrann of dectrophoresed total RNA from a panel of 
breast cancer cell lines probed with a CH8-2a1 3-1 insert (Panel A) or a loading control (Panel B). 

5 Flgun 4 is a half-tone reproduction of an autoradiogram of electrophoresed DNA digests from a 
panel of breast cancer cell lines probed with a CH13-2a12-1 insert. 

Figure 5 is a half-tone reproduction of an autoradiogram of electrophoresed total RNA from a panel of 
breast cancer cell lines probed with a CH13-2a12-1 insert. 

10 

Figure 6 is a map of cDNA fragments obtained for the breast cancer associated genes CH1-9a1 1-2. 
CH8-2a13-1, CH13-2a12-1 and CH14-2a16-1. Regions of the fragments used to deduce sequence 
data listed in the application are indicated by shading. Nucleotide positions are numbered from the 
left-most residue for which double-strand sequence data has been obtained, which is not necessarily 
15 the 5' terminus of the conesponding message. 

Figure 7 is a listing of primers used for obtaining the cDNA sequence data for CH1-9a1 1-2. 
Figure 8 is a listing of cDNA sequence obtained for CH1-9a1 1-2. 

20 

Figure 9 is a listing of the amino acid sequence corresponding to the longest open reading frame of 
the DNA sequence of CH1-9a11-2 shown in Figure 6. The single-letter an^no add code Is used. 
Stop codons are indicaled by a dot (•). The upper panel shows the complete amino acid translation; 
the lower panel shows the predicted gene product protein sequence. A possible transmembrane 
25 region is indicated by underiining. 

FIgum f 0 is a listing of primers used for obtaining the cDNA sequence date for CH8-2a13-1 . 
Figure tf Is a listing of cDNA sequence obtained for CH8-2a13-1 . 

30 

Figure 12 is a listing of the amino acid sequence corresponding to the longest open reading frame of 
the DNA sequence of CH8-2a13-1 shown in Figure 11. The upper panel shows the complete amino 
acid translation; the lower panel shows the predicted gene product protein sequence. 

35 Figure 13 tee listing of the nucleotide sequence predicted for a full-length CH8-2a13-1 cDNA. 

Figure 14 is a fisting of the amino add sequence corresponding to ttie fongest open reading frame of 
the DNA sequence of CH8-2a13-1 shown in Figure 13. 
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ngure IS is a listing of primers used for obtaining the cDNA sequence data for CH13-2a12-1 . 
HgurB IB is a listing of cDNA sequence obtained for CHI 3*2a12-1 . 

5 

Figure 17 is a listing of the amino add sequence corresponding to the longest open reading frame of 
the DNA sequence of CH13-2a12-1 shown in Figure 16. The upper panel shows the complete amino 
acid translation; the lower panel shows the predicted gene product protein sequence. 

1 0 Figure 18 is a listing of primers used for obtaining cDNA sequence data for CH1 3-2a12-1 

Figure 19 Is a listing of the cDNA sequence data obtained by two-directional sequencing for CH14- 
2a16.1 

15 Figure 20 is a listing of the amino acid sequence corresponding to the longest open reading frame of 
the DNA sequence of CH14-2a16-1 shown in Figure ig. The upper panel shows the complete amino 
acid translation; the tower panel shows the predicted gene product protein sequence. Residues 
corresponding to three zinc finger motifs are underlined, indicating that the protein may have DNA or 
RNA binding activity. 

20 

Figure 21 is a listing of additional DNA sequence data towards the 5' end of CH14-2a16-1 obtained 
by one-directional sequencing of the fragment pCH14-1.3. First two panels show nucleotide and 
amino add sequence from the 5' end of the fragment; the second two panels show nucleotide and 
amino acid sequence from the 3' end of the fragment. Regions of overlap with pCH14-800 are 
25 underlined. 

Figure 22 is a listing of the nucleotide sequences of initial fragments obtained corresponding to the 
four breast cancer assoctaled genes, along with their amino add translations. 

30 Figure 23 is a listing of additional cDNA sequence obtained for CH1-9a11-2, comprising 
approximately 1934 base pairs S' from the sequence of Figure 8. 

Figure 24 is a listing of the amino add sequence conresponding to the longest open reading frame of 
the DNA sequence of CH1-9a11-2 shown in Figure 23. The single-letter amino add code is used. 
35 Stop oodons are indicated by a dot (•). 

Figure 25 is a listing of additional cDNA sequence obtained for CH14-2a16-1, comprising 
approximately 1934 base pairs 5' from the sequence of Figure 19. 
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Figure 26 is a Hsting of the amino acid sequence corresponding to the longest open reading frame of 
the DNA sequence of CH1-9a11-2 shown in Figure 25. The single-letter amino add code is used. 
Stop codons are indicated by a dot (•). The upper panel shows the complete amino acid translation; 
5 the lower panel shows zthe predicted gene product protein sequence. 



BgST MODE FOR CARRYING QUT THE INVENTION 



This invention relates to the discovery and characterization of four novel genes associated 
10 with breast cancer. The cDNA of these genes, and their sequences as disclosed below, provide the 
basis of a series of reagents that can be used in diagnosis and therapy. 

Using a panel of about 15 cancer cell lines, each of the four genes was found to be duplicated 
in 4060% of the cells tested. Surprisingly, each of the four genes was duplicated in at least one cell 
line where studies usmg comparative genorrac hybridization had not revealed any amplification of the 
15 corresponding chromosomal region. 

Levels of expression at the mRNA level were tested in a similar panel for two of tttese four 
genes. In addition to those cell lines showing gene duplication, 17 to 37% of the lines showed RNA 
overabundance without gene duplication, indicating that the malignant cells had used some 
mechanism other than gene duplication to prorTX)te the abundance of RNA corresponding to these 
20 genes. All four of the breast cancer genes have open reading franies, and likely are transcribed at 
various levels in different cell types. Overabundance of the conresponding RNA in a cancerous cell is 
likely associated witi) overexpresslon of the protein gene product Such overexpression may be 
manifest as increased secretion of ttte protein from tiie cell into blood or the surrounding environment, 
an increased densHy of the protein at the cell surface, or an increased accunuilaHon tiie protein witt)in 
25 the cell, in comparison to the typical level in noncancerous cells of the saime tissue type. 

Different tumors bear different genolypes and phenotypes, even when derived from the same 
tissue. Gene therapy in cancer is more likely to be effective if it is aimed at genes tiiat are involved in 
supporting the malignancy of the cancer. This invention discloses genes tiiat achieve RNA 
overabundance by several mechanisms, because they are more likely to be directiy involved in the 
30 pathogenic process, and therefore suitable targets for pharmacological manipulation. 

Features of the four novel genes, the respective mRNA, and ttie cDNA used to find them are 
provided in Table 1. 
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TABt^ t: Chaiacteristics of 



1 ^ ChVomosoine 


Designation 


mRHA 
Observed 


Exemplaiy cDNA 
; FraginaitteCloned ;: 


1 


CH1-9a11-2 


5.5kb. 4.5kb 


1.1 kb. 2.5 kb 


8 


CH8-2a13-1 


4.2kb 


0.6 kb (two). 3.0 kb. 
4.0 kb 


13 


CH13-2a12-1 


3.5kb. 3.2kb 


1.6 kb, 3.5 kb 


14 . 


CH14-2a16-1 


3.8kb, 3kb 


0.8 kb.1.3 kb.I.e kb. 2.5 
kb 



All four genes sequences are unrelated to other genes known to be overexpressed in breast 
cancer, including the ertB2 gene (Adnane et al.). tissue factor (Chen et al.). mamnnaglobulin (Watson 
et a!.), and DD96 (Kocher et al.). 
5 The four mRNA sequences each comprise an open reading frame. The CH1-9a1 1-2 gene Is 

expressed at the mRNA level at relatively elevated levels In pancreas and testis. The CH8-2a13-1 
gene is expressed at relatively elevated levels in adult heart spleen, thymus, small intestine, colon, 
and tissues of the reproductive system; and at higher levels in certain tissues of the fetus. The CH13- 
2a12-1 gene is expressed at relatively elevated teves in heart skeletal muscle, and testis. The CH14- 

10 2a16-1 gene Is expressed at relatively elevated levels in testis. The level of expresston of all four 
genes is especially high in a substantial proportk)n of breast cancer cell lines. 

The CH1-9a11-2 gene encodes a protein with a putative transmembrane region, and may be 
expressed as a surfece protein on cancer cells. The CH13-2a12-1 gene is distantly related to a C. 
elegans gene implicated in cell cycle regulation, and may play a role in the regulation of cell 

15 proliferation. The protein encoded by CH13-2a12-1 is distantly related to a vasopressin-activated 
calcium binding receptor, and may have Ca"^ birKJing activity. The CH14-2a16-1 comprises at least 
five domains of a zinc finger trinding motif and is distantly related to a yeast RNA binding protein. The 
CH14-2a16-1 gene product is suspected of having DMA or RNA binding activity, which may relate to a 
role in cancer pathogenesis. 

20 The four genes described here are exemplars of genes that undergo altered expressbn in 

cancer. kJentifiable using the gene screening methods of the inventkxi. The method involves an 
analysis for both DNA duplicatk)n and altered RNA abundance relating to the same gene. Since 
abrtormal gene regulation Is central to the malignant process, the identification method may be 
brought to bear on any type of cancer. 

25 The screening method is superior to any previously available approach in several respects. 

Particulariy significant is that screening is rapidly focused towards genes that are central to the 
malignant process, and away from those that have variable levels of expression as part of normal 
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metabolic processes. Furthermore, because the end-product is a cDNA corresponding to the 
gene, the process leads rapidly to detailed characterization of the gene, and any effector molecule 
it nnay encode. This in turn leads to development of new diagnostic and therapeutic materials and 
techniques. 

5 

Definitions 

Tenms used in this application include the following: 

The term "polynucleotide'* refers to a polymeric form of nucleotides of any length, either 

10 deoxyribonucleotides or ribonucleotides, or analogs thereof. Polynucleotides may have any 
three-dimensional structure, and may perform any function, known or unknown. The following are 
non-limiting examples of polynucleotides: a gene or gene fragment, exons. Introns, messenger RNA 
(mRNA). transfer RNA, ribosomal RNA, rit>ozymes, cDNA, recombinant polynucleotides, branched 
polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, 

15 nucleic acid probes, and primers. A polynucleotide may comprise modified nucleotides, such as 
nriethylated nucleotides and nucleotide analogs. If present modifications to the nucleotide structure 
may be imparted before or after assembly of the potynrrer. The sequence of nucleotides may be 
intenrupted by non-nucleotide components. A polynucleotide may be forther modified after 
polymerization, such as by conjugation with a labeling component 

20 The term polynucleotide, as used herein, refers interchangeably to double- and 

single-stranded molecules. Unless otherwise specified or required, any embodiment of the invention 
described herein that is a polynucleotide encompasses botti the double-stranded form, and each of 
two complementery single-stranded fonns known or predicted to make up the dout>le-stranded form. 

In the context of polynucleotides, a "linear sequence" or a "sequence" is an order of 

25 niKdeotides in a polynucleotide in a 5' to 3' direction in which reskiues that neighbor each other in the 
sequence are contiguous in the primary structure of the polynudeotkJe. A "partial sequence" Is a 
finear sequence of part of a polynucleotide which is known to comprise additional residues in one or 
both directksns. 

"Hybridization' refers to a reaction in which one or more polynucleotides react to form a 
30 complex that is stabilized via hydrogen tx>nding between the t>ases of the nucleotide residues. The 
hydrogen bonding is sequence-specific, and typicali/ occurs by Watson-Crick base pairing. A 
hybridization reaction may constitute a step in a more extensive process, such as the initiation of a 
PGR. or ttie enzymatic cleavage of a polynucleotide by a riboryme. 

Hybridization reactions can be perfonned under conditions of different "stringency". Relevant 
35 conditions Include temperature, tonic strengtti, time of incubation, the presence of additional solutes in 
the reaction nMure such as formamide, and the washing procedure. Higher stringency conditions 
are tiiose conditions, such as higher temperature and tower sodium ton concentration, which require 
higher minimum oonnplementerity between hybridizing elements for a stable hybrklization complex to 
forni. Conditions tttat increase ttie stringent of a hybridization rea<^n are wtoely known and 
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published in the art see. for example, "Molecular Ctoning: A Laboratory Manuar, Second Edition 
(Sambrook, Fritsch & Maniatis, 1989). 

When hybridization occurs in an antiparallel configuration between two single-stranded 
polynucleotides, those polynucleotides are described as "complementary". A double-stranded 
5 polynucleotide can be "complementary'* to another polynucleotide, if hybridization can occur between 
one of the strands of the first polynucleotide and the second. Complementarity (the degree that one 
polynucleotide complementary with another) is quantifiable in temis of the proportion of bases in 
opposing strands that are expected to form hydrogen tx)nding with each other, according to generally 
accepted base-pairing rules. 

10 A linear sequence of nucleotides is "identical" to another linear sequence, if the onjer of 

nucleotides in each sequence is the same, and occurs without substitution, deletion, or material 
substitution. It is understood that purine and pyrimidine nitrogenous bases with similar structures can 
be functionally equivalent in terms of V\MsorvCrick base-pairing; and the inter-substitution of like 
nitrogenous bases, particulariy uracil and thymine, or the modification of nitrogenous bases, such as 

15 tiy methylatbn, does not constitute a material substitutkxi. An RNA and a DMA polynudeotkle have 
klentical sequences when the sequence for the RNA reflects the order of nitrogenous t>ases in the 
polyribonucleotkles. the sequence for the DNA reflects the order of nitrogenous bases in the 
polydeoxyribonucleotides, and the two sequences satisfy the other requirements of this definition. 
Where one or both of the polynucleotides being compared is double-stranded, the sequences are 

20 klentical if one strand of the first polynucleotide is klentical with one strand of the second 
IK^Jynudeotkle. 

A linear sequence of nucleotides Is "essentially klenticaT to another linear sequence, if both 
sequences are capable of hybridizing to form a duplex with the same complementary polynucleotide. 
Sequences that hybridize under condittons of greater stringency are nrK>re prefened. It is understood 

25 that hybridization reacttons can accommodate insertkxis. deletions, and substitutions in the nucleotide 
sequence. Thus, linear sequences of nucleotides can be essentially klentical even if some of the 
nucleotide residues do not precisely correspond or align. In general, essentially identical sequences 
of about 40 nucleotkles in lengtti will hybridize at about 300C in 10 x BSC (0.15 M NaCI, 15 mM 
citrate huffier); preferably, ttiey wilt hybridize at about 400C in 6 x SSC; more preferably, they will 

30 hybridize at about 500C in 6 x SSC; even more preferably, tiiey will hybridize at about 600C in 6 x 
SSC, or at about 400C in 0.6 x SSC, or at about 300C in 6 x SSC containing 50% formamide; still 
more preferably, they will hybridize at 400C or higher in 2 x SSC or kiwer in the presence of 50% or 
more fonnamkle. It is understood ttiat tfie rigor of ttie test is partiy a function of the lengtti of the 
polynucleotide; hence shorter poiynudeottdes witti tiie same honK>k>gy shoukl be tested under bwer 

35 stringency and longer polynucleotides shouU be tested under higher stringency, ad^ting tiie 
conditions accordingly. The relationship between hybridization stringency, degree of sequence 
klentity. and polynucleotide length is known in ttie art and can be calculated by standard formulae; 
see, e.g., Meinkoth et al. Sequences that correspond or align more closely to the invention disclosed 
herein are comparably more preferred. Generally, essentially klentical sequences are at least about 
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50% identical with each other, after alignment of the honrwlogous regions. Prelerat>ly. the sequences 
are at least akx)ut 60% identical; more preferably, they are at least about 70% ldentica^, more 
preferably, they are at least about 80% Identical; more preferably, the s^uences are at least about 
90% Identical; even more preferably, they are at least 95% identical; still more preferably, the 
5 sequences are 100% identical. Percent identity is calculated as the percent of residues in the 
sequence being compared that are identical to those in the reference sequence, which is usually one 
of those listed or described in this application, unless stated othenvise. No penalty is imposed for 
introduction of gaps in the reference or comparison sequence for purposes of alignment, but the 
resulting fragments must be rationally derived — small gaps may not be introduced to trivially improve 

10 the identity score. 

in determining whether polynucleotide sequences are essentially identical, a sequence that 
preserves the functionality of the polynucleotide with which It is being compared is particularly 
preferred. Functionality may be established by different criteria, such as ability to hybridize vinth a 
target potynudeolide, and whether the polynucleotide encodes an identical or essentially identical 

15 polypeptides. Thus, nucleotide sut>stltutions which cause a non-conservative substitution in the 
encoded polypeptide are preferred over nucleotide sut>stitutions that create a stop codon; nucleotide 
substitutions that cause a conservative substitution in the encoded polypeptide are nrK>re preferred, 
and Identical nucleotide sequences are even more preferred. Insertions or deletions in the 
polynucleotide that result in insertions or deletions in the pdypepttde are preferred over those that 

20 result in the down-stream coding region t>eing rendered out of phase. The relative importence of 
hybridization properties and the polypeptide encoded by a polynucleotide depends on the application 
of the invention. 

A "reagenf polynucleotide, polypeptide, or antibody, is a substance provided for a reaction, 
the sut^nce having sornelcnown and desirat)le parameters for the reaction. A reaction mixture may 

25 also contein a "tergef , such as a polynucleotide, antibody, or polypeptide that the reagent is capable 
of reacting with. For exampte. in some types of diagnostic teste, the amount of the terget in a sample 
is determined by adding a reagent, allowing the reagent and terget to react, and measuring the 
aoKKint 6f reaction product. In the context of dinical management, a "tergef may also be a cell, 
collection of cells, tissue, or organ that is ttie object of an administered substence, such as a 

30 pharmaceutical confound. 

"cDNA" or ''complementery DNA" is a single* or double-stranded DNA polynucleotide in which 
one strand is comptementery to a messenger RNA. *FulMength cDNA" is cDt4A cx>mprised of a strand 
which is complementeiy to an entire messenger RNA molecule. A "cDNA fragment' as used herein 
generally represente a sub-region of the full-lengtti form, but the entire fulMengtti cDNA may also be 

35 included. Unless explicttly specified, the term cDNA encompasses both flie full-length form and the 
firagmentlbrn). 

Different polynucleotides are said to "conespond" to each other if one is ultimately derived 
from another. For example, messenger RNA oonesponds to the gene finom whtoh It is transcribed. 
cDNA corresponds to the RNA from which it has been produced, such as by a reverse transcription 
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reaction, or by chemical synthesis of a DMA tiased upon knowledge of the RNA sequence. cDNA 
also conesponds to the gene that encodes the RNA. Polynucieotides may be said to correspond 
even when one of the pair is derived from only a portion of the omer. 

A "probe" when used in the context of polynucleotide manipulation refers to a polynucleotide 

5 which is provided as a reagent to detect a target potentially present in a sample of interest by 
hybridizing virith the target Usually, a probe will comprise a label or a means by which a label can be 
attached, either before or subsequent to the hybridization reaction. Suitable labels include, but are not 
limited to radioisotopes, fluorochromes. c^miluminescent compounds, dyes, and enzymes. 

A "primer' is a short polynucleotide, generally witii a free 3* -OH group, ttiat binds to a target 

10 potentially present in a sample of interest by hybridizing witii the target, and tiiereafter pronnoting 
polymerization of a polynucleotide complementary to the target A "polymerase chain reaction" 
fPCR") Is a reaction in which replicate copies are made of a target polynucleotide using one or more 
primers, and a catalyst of polymerization, such as a reverse transcriptase or a DMA polymerase, and 
particulariy a themnally stable polymerase enzyme. Methods for PCR are taught in U.S. Patent Nos. 

15 4.683,195 (MuUis) and 4,683,202 (Muilis et al.). All processes of producing replicate copies of ttte 
same polynucleotide, such as PCR or gene cloning, are collectivety referred to herein as "replication." 

An "operon" is a genetk: region comprising a gene encoding a protein and functionally related 
5' and 3' flanking regions. Elements vinthin an operon include but are not limited to promoter regions; 
enhancer regions, repressor binding regions, transcription initiation sites, ribosome binding sites. 

20 translation initiation sites, protein encoding regions, introns and axons, and temntnation sites for 
transcription and translation. A "promoter^ is a DMA regk>n capable under certain conditions of 
binding RNA polymerase and hiitiatir^ transcription of a coding region located downstream (tp the 3' 
direction) fix>m tiie pronrxiter. "Operably linked" refers to a juxtaposition of genetic elements, wherein 
the elements are In a relationship permitting them to operate in the expected manner. For instance, a 

25 pronrx>ter Is operably Knked to a coding region if the promoter helps initiate transcription of the coding 
sequence. There may be intervening reskiues between the promoter and coding region so k>ng as 
this functional relatk>nship is maintained. 

"Gene duplication" is a term used herein to describe the process whereby an Increased 
number of copies of a particular gene or a fragment tiiereof is present In a particular cell or cell line. 

30 "Gene amplifrcatkxi" generally is synonymous with gene duplication. 

"Expressk>n" is defined alternately in the scientifk: literature either as the transcription of a 
gene into an RNA polynucleotide, or as ttie tiBnscription and subsequent tiranslation into a 
polypeptide. As used herein, "expression" or "gene expresston" generally refers to the production of 
the RNA unless specified or required otherwise. Thus, "RNA overexpresston" reflects ttie presence of 

35 more RNA (as a proportion of total RNA) from a particular gene In a cell being described, such as a 
cancerous cell, in relation to that of tiie cell it is being compared wRh, such as a non-cancerous cell. 
The protein product of the gene may or rnsr^ not be produced in normal or abnormal amounts. 
"Protein overexpressk>n' similariy reflects the presence of relatively more protein present in or 
produced by, for example, a cancerous oeD. 
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"Abundance" of RNA refers to the amount of a particular RNA present in a particular cell type. 
Thus, "RNA overabundance" or "overabundance of RNA" describes RNA that is prroent in greater 
proportion of total RNA in the ceil type being described, compared with the same RNA as a proportion 
of the total RNA in a control cell A number of mechanisms may contribute to RNA overabundance in 

. 5 a particular cell type: for example, gene duplication, increased level of transcription of the gene, 
increased persistence of the RNA within the cell after it is produced, or any combination of these. 
Similariy. "lower abundance" or "underabundance" descrit>es RNA that is present in lower 
proportion in the cell being described compared with a control cell. 

The terms "polypeptide*, "peptide** and ''protein" are used interchangeably herein to refer to 

10 polymers of amino acids of any length. The polymer may be linear or branched, it may comprise 
modified amino acids, and it may be interrupted by non-amino acids. The terms also encompass an 
amino add polymer that has been modified; lor example, disulfide bond fomnation. glycosylation, 
llfMdation, acetyiatlon, phosphorylation, or any other manipulation, such as conjugation with a labeling 
component 

15 In the context of polypeptides, a "linear sequence" or a "sequence" is an order of amino acids 

in a polypeptide in an N4eniiinal to C-terminal direction in which residues that neighbor each other in 
the sequence are contiguous in the primary structure of the polypeptide. A "partial sequence" is a 
linear sequence of part of a polypeptMe which is known to comprise additional residues in one or both 
directions. 

20 A linear sequence of amino acids is "essentially Identical" to another sequence if the two 

sequences have a suk)stantial degree of sequence identity. It is understood that the functionat 
proteins can accommodate insertions, delettons, and substitutions in the arrdno acid sequence. Thus, 
linear sequences of amino acids can be essentiafly identical even if some of the residues do not 
precisely correspond or align. Sequences that correspond or align more dosely to the Invention 

25 disclosed herein are more preferred. It is also understood that some anvno add substitutions are 
more easily tolerated. For example, substitution of an amino add with hydrophobic side chains, 
aromatic side chains, polar side chains, side chains with a positive or negative charge, or side chains 
comprising two or fewer cart)on atoms, by another amino add with a side chain of like properties can 
occur without disturt>ing the essential kJentity of the two sequences. Methods for determining 

30 honr^otogous regions and scoring the degree of homotogy are well known in the art; see for example 
Altschul et al. and Menikoff et al. WelMolerated sequence differences are referred to as "conservative 
substitutions". Thus, sequences with conservative substituttons are prefened over those with other 
sut}stitutions in the same positions; sequences with kJentical resMues at the same positions are still 
more preferred. In general, amino add sequences that are essentially Identical are at least about 

35 1 5% identical, and comprise at least about anottier 1 5% which are eittier identical or are conservative 
substitutions, after alignment of homologous regtons. More preferably, essentially htentical 
sequences comprise at least about 50% klentical residues or consenrative substitutions; more 
preferably, they comprise at least about 70% kJentical residues or consen^tive substitutions; more 
preferably, they comprise at least about 80% klentical resMues or consen/ative substitutions; more 
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preferably, they oomprise at least about 90% identical residues or conservative substitut'ons; more 
preferably, they oomprise at least about 95% identical residues or conservative sut>stitufions: even 
more preferably, they contain 100% identical residues. 

In determining whether polypeptide sequences are essentially identical, a sequence that 

5 preserves the functionanty of the polypeptide with which it is being compared is particularly preferred. 
Functionality may be established by different parameters, such as enzymatic activity, the binding rate 
or affinity in a receptor-ligand interaction, the binding affinity with an antibody, and X-ray 
crystallographic structure. 

An ^'antibody" (interchangeably used in plural fom) is an immunoglobulin molecule capable of 

10 specific binding to a target, such as a polypeptide, through at least one antigen recognition site, 
located in the variable region of the imn^jnoglobulin molecute. As used herein, the term 
encompasses not only intact antifcxxJies. but also fragmente thereof, mutents thereof, fusion proteins, 
humanized antibodies, and any other modified configuration of the immunoglobulin molecule that 
comprises ah antigen reoognition site of the required speciftoity. 

15 The temi "antigen* refers to the terget molecule that is specifically bound by an antibody 

through ite antigen recognition site. The antigen may, but need not be chemically related to the 
inrmiunogen that stimulated production of the anblxxly. The antigen may be polyvalent or it may be a 
nx^novalent hapten. Examples of kinds of antigens that can be recognized by antibodies include 
polypeptides, polynucleotides, other antibody molecules, oligosaccharides, complex lipids, drugs, and 

20 chemicals. An "immunogen" is an antigen capable of stinruilating production of an antibody when 
injected into a suiteble host, usually a mamntal. Compounds may be rendered irrmunogentc by many 
techniques known in the art, including crosslinking or conjugating with a carrter to increase valency, 
mixing with a mitogen to increase the inrurmine response, and combining with an adjuvant to enhance 
presentation, 

25 An "active vaccine" is a pharmaceutical preparation for human or animal use, which is used 

with the intention of eliciting a specific immune response. The Immune response may be either 
humoral or cellular, systemic or secretory. The immune response may be desired for experimentel 
purposes, for the treatment of a particular condition, for the eliminatkMi of a particular substence, or for 
prophylaxis against a particular condition or sut>stence. 

30 An "isolated** polynucleotide, polypeptkle, protein, antibody, or other substence refers to a 

preparation of the substence devoid of at least some of the other components that may also be 
present where the substance or a similar substence naturally occurs or is initially obteined from. 
Thus, for example, an isolated sut>stence may be prepared by using a purificatkNi technique to enrich 
it fix>m a source mixture. Enrichment can be measured on an absolute basis, such as we^t per 

35 volume of solution, or it can be measured in relatkMi to a second, potentially interfering substance 
present in the source mixture. Increasing enrichmente of the embodimente of this inventton are 
increasingly more preferred. Thus, for exampte. a 2-ft>kt enrichment is preferred. 10-foki enrichment 
is more prefened. 100-lbld enrichment is more prefmed, lOOO-fbId enrichment is even more 
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preferred. A substance can also be provided in an isolated state by a process of artificial assembly, 
such as by chemlcai synthesis or recombinant expression. 

A polynucleotide used in a reaction, such as a probe used in a hybridization reactton. a primer 
used in a PCR, or a poiyhucleotide present in a pharmaceutical preparation, is refened to as "specific" 
5 or "sejective" If it hybridizes or reacts with the intended target more frequently, more rapidly, or with 
greater duration than it does with alternative substances. SImilariy, an antibody is referred to as 
"specific* or "selective" if it binds via at least one antigen recognition site to the intended target more 
frequently, more rapidly, or with greater duration than it does to alternative substances. A 
polynucleotide or antibody is said to ''selectively inhibit" or "selectively interfere with" a reaction if it 

10 Inhibits or interferes with the reaction t>etween particular substrates to a greater degree or for a 
greater duration than it does with the reaction between altemative substrates. An antibody is capable 
of "specificany deHvering" a substance if it conveys or retains that substance near a particular cell type 
more frequently or for a greater duration compared with other cell types. 

The "effector componenf of a phanmaceutical preparation is a confqx>nent which modifies 

15 target cells by altering their function in a desirable way when administered to a subject bearing the 
cells. Some advanced phanmaceutical preparattons also have a "targeting componenf. such as an 
antitxxly, which helps deliver the effiector component more efficaciously to the target site. Depending 
on the desired action, the effector component may have any one of a number of modes of action. For 
example, it may restore or enhance & normal function of a cell, it may eliminate or suppress an 

20 abnormal function of a cell, or it may alter a cell's phenotype. Alternatively, it may kill or render 
dormant a celt with pathological features, such as a cancer cell. Examples of effector components are 
provided in a later section. 

A "pharmaceutical candidate" or "drug candidate" is a compound believed to have therapeutic 
potential, that is to be tested for efficacy. The "screening" of a pharmaceutical candidate refers to 

25 conducting an assay that is capable of evaluating the efficacy and/or spectfidty of the candidate. In 
this context, "efficacy" refers to the ability of the candidate to effect the cell or organism it is 
administered to in a t>eneficia1 way: for example, the Hmitatbn of the pathology of cancerous cells. 

A "cell line" or "cell culture" denotes higher eukaryotic cells grown or matnteined in vitro. It is 
understood that the descendants of a cell may not be completely identical (either morphologically, 

30 genotypically, or phenotypically) to the parent cell. Cells described as "uncultured** are ot>teined 
directly from a living organism, and have been mainteined for a limited amount of time away from the 
organism: not long enough or under conditions for the cells to undergo substential replication. 

"Genetic alteration" refers to a process wherein a genetic element is introduced into a cell 
other than by mitosis or meiosis. The element may be heterologous to the ceH. or it may be an 

35 additional copy or improved version of an element already present in the cell. Genetic alteration 
may be effected, for exampte, by transfecHng a cell with a recombinant plasmid or other 
polynucleotide through any process known in the art, such as electroporation, calcium phosphate 
prectpitetlon, or contacting with a polynucleotide-liposome complex, or by transduction or infection 
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with a DNA or RNA virus or viral vector. The alteration is preferably but not necessarily inheritable 
by progeny of the altered cell 

A "host ceir is a cell which has been genetically altered, or is capable of being genetically 
altered, by adnninistratlon of an exogenous polynucleotide. 
5 The terms "cancerous ceir or "cancer cell", used either in the singular or plural form, refer to 

cells that have undergone a malignant transformation that makes them pathological to the host 
organism. Malignant transformation is a single- or multi-step process, which involves in part an 
alteration in the genetic makeup of the cell and/or the expresskm profile. Malignant transformation 
may occur either spontaneously, or via an event or combinatfon of events such as drug or chemical 
10 treatment, radiation, fusion with other cells, viraMnfectk}n, or activation or inactivation of particular 
genes. Malignant transfbrmatton may occur in vivo or in vitro, and can if necessary be experimentally 
induced. 

A frequent feature of cancer cells is the tendency to grow in a manner that is uncontrollable 
by the host, but the pathology assocterted with a partteular cancer cell may take another fbnri. as 

15 outlined infra. Primary cancer cells (that is. cells obtained from near the site of malignant 
transformation) can be readily distinguished from non-cancerous celte by well-established techniques, 
particularly histological examination. The definition of a cancer oell. as used herein, includes not only 
a primary cancer cell, but any cell derived from a cancer cell ancestor. This includes metastasized 
cancer cells, and in vitro cultures and ceR lines derived from cancer cells. 

20 The "pathology" caused by a cancer cell within a host is anyttiing that compromises the 

well-being or normal physblogy of the host This may involve (but is not limited to) abnormal or 
uncontrolteble growth of the cell, metastasis, release of cytoidnes or other secretory products at an 
inappropriate level, manifestation of a function inappropriate for its physiotogtoal milieu, interference 
with the ncmnal function of neighboring cells, aggravation or suppression of an infiamnnatory or 

25 immunok)gical response, or the harboring of undesirable chemical agents or invasive organisms. 

"Treatmenf of an Individual or a cell is any type of intervention in an attempt to alter the 
natural course of the individual or cell. For example, tiBatment of an indivkiual may be undertaken to 
decrease or limit the patiiology caused by a cancer cell harit)ored in the indivkiual. TreatiDent includes 
(but is not limited to) administration of a composition, such as a pharmaceutical composition, and may 

30 be performed eitiner prophylactically. or subsequent to the initiation of a pathologic event or contact 
with an etiologic agent Effective amounts used in treatment are those which are sufficient to 
produce the desired effect, and may be given in single or divkled doses. 

A 'control cell' is an alternative source of cells or an alternative cell line used in an experiment 
for comparison purposes. Where the purpose of the experiment is to establish a base line for gene 

35 copy number or expresskm level, it is generally preferable to use a control cell that is not a cancer 
oell. 

The term "cancer gene" as used herein refers to any gene which is yiekfing transcription or 
translation products at a substantially altered level or in a substantially altered form In cancerous cells 
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compared with non-cancerous cells, and which may play a role in supporting the malignancy of the 
cell. It may be a normally quiescent gene that becomes activated (such as a dominant 
proto-oncogene), it may be a gene that becomes expressed at an abnomnally high level (such as a 
growth fiactor receptor), it may be a gene that becomes mutated to produce a variant phenotype» or it 
5 may be a gene that becomes expressed at an abnormally low level (such as a tumor suppressor 
gene). The present invention is directed towards the discovery of genes in all these categories. 

It Is understood that a '^clinical sample" encompasses a variety of sample types obtained from 
a subject and useful in an in vitro procedure, such as a diagnostic test. The definition encompasses 
solid tissue samples obtained as a surgical rerTK>val. a pathology specimen, or a biopsy specimen, 
10 tissue cultures or cells derived therefrom and the progeny thereof, and sections or smears prepared 
' from any of these sources. Non-limiting examples are samples obtained from breast tissue, lymph 
nodes, and tumors. The definition also encompasses blood, spinal fluid, and other liquid sample of 
biologic origin/ and may refer to either the cells or cell fragments suspended therein, or to the Kquid 
medium and its solutes. 

15 The term "relative amounT is used where a comparison is made between a test 

measurement and a control measurement. Thus, the relative amount of a reagent fomrung a complex 
in a reaction is ttie amount reacting with a test specimen, compared with tiie amount reacting with a 
control specimen. The control specimen may be mn sepevatoly in the same assay, or it may be part 
of the same sample (for example, nomnal tissue surrounding a malignant area in a tissue section). 

20 A "differentiar result is generally obtained from an assay in which a comparison is made 

between the findings of two di^rent assay samples, such as a cancerous cell line and a control cell 
line. Thus, for example, "differentiat expression" is observed when the level of expression of a 
particular gene is higher In one ceil than anotiier. "Differential display" refers to a display of a 
component, particularly RNA, from different cells to detennlne If there Is a difference In the level of ttie 

25 component amongst different cells. Differential display of RNA is conducted, for example, by selective 
production and display of cDNA corresponding thereto. A method for perfomoing differential display is 
provided In a later section. 

A polynucleotide derived from or corresponding to CH1-9a11-2, CH8-2a13-1, CH13-2a12-1, 
or CH14-2a16-1 is any of the following: the respective cDNA fragments, the corresponding 

30 messenger RNA. including splice variants and fragments thereof, both strands of the corresponding 
fulMength cDNA and fragments thereof, and the corresponding gene. Isolated allelic variants of any 
of ttiese forms are included. This invention embodies any polynucleotide corresponding to CH1'9a1 1- 
2, CH6-2a13-1, CH13-2a12-1, or CH14-2a16-1 in an isolated form. It also embodies any such 
polynucleotide that has been ctoned or transfected into a cell Ine. 

35 When used in referring to the gene screening methods of this invention (such as those 

outlined in the last paragraph), "displaying cDNA* is any technique in which DNA copies of RNA 
(not restricted to mRNA) is rendered detectebfe in a quantitetive or relatively quantitetive fashion, in 
ttiat DNA copies present In a relatively greater amount in a first sample compared witii a second 
sample generates a relatively stronger or weaker signal compared with that of the second sample 
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due to the difference in copy number. Separate display of different cDNA in a preparation 
(particularly but not liniited to cDNA of different size) allows comparison of levels of a particular 
cDNA between different samples. A preferred method of display is the differential display 
technique, and enhancements thereupon described in this disclosure and elsewhere. 
5 The term "digested" DNA encompasses DNA (particularly chromosomal DNA) that has 

been fragmented by any suitable chemical or enzymatic means into fragments conveniently 
separable by standard techniques, particularly gel electrophoresis. Digestion with a restriction 
endonuclease specific for a particular nucleotide sequence is prefen^d. 

"Hybridizing" in this context refers to contacting a first polynucleotide with a second 

10 polynucleotide under conditions that permit the formation of a multi-stranded poiynucteotide duplex 
whenever one strand of the first polynucleotide has a sequence of sufficient complementarity to a 
sequence on the second polynucleotide. The duplex nrtay be a long-lived one. such as when one 
DNA molecule is used as a labeled probe to detect another DNA molecule, that may optionally be 
bound to a nitrocellulose fUter or present in a separating gel. The duplex may also be a shorter- 

15 lived one. such as when one DNA molecule is used to prime an amplification reaction of the other 
DNA molecule, and the amplified product is subsequently detected. The practitioner may alter the 
conditions of the reaction to alter the degree of complementarity required, as long as sequence 
specificity remains a determining fector in the reaction. 

Unless explicitly indicated or othenArise required by the techniques used, the steps of a 

20 method of this invention may be p«formed in any order, or combined where desired and 
appropriate, in one example, in the method comprising steps a) through h) that is described 
, above, it is entirely appropriate to corxluct steps a) to c) of the method either before or after steps 
e) to g) of the method, as long as the cDNA ultimately selected fulfitis the criteria of both steps d) 
and step h). In another example, screening against different digested DNA preparations, even if 

25 outlined separately, may optionally be done at the same time. All permutetions of this kind are 
within the scope of the invention. 

Generai methods 

30 The practice of the present invention will employ, unless othenvise indicated, conventional 

techniques of molecular biology, microtnology, recombinant DNA, and immunology, which are within 
the skill of the art Such techniques are explained fully in the literature. See. for example, "Molecular 
Cloning: A Laboratory Manuar, Second Edition (Sambrook, Fritsch 8. Maniatis. 1989). 
"Ol^nudeotkte Synthesis" (MJ. Gait, ed., 1984), "Animal CeO Culture" (R.L Fr^hney. ed., 1987); 

35 the series "Methods in Enzymotogy" (Academic Press, Inc.); "Handbook of Experimental ImnrHjnology' 
(D.M. \Afeir & C.C, Blackwell. Eds ). "Gene Transfer Vectors for Mammalian Cells" (J.M. Miller & M.P. 
Catos. eds., 1987). "Cunent Protocols in Molecular Biology" (F.M. Ausubel et al.. eds.. 1987); and 
"Current Protocols in Immunology" (J.E. Coligan et al., eds., 1991). All patents, patent applications. 
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articles and publications mentlonecl herein, both supra and Infra, are hereby incorporated herein by 
reference. 

Features of the cancer gene screening method 

5 

The cancer gene screening methods of this Invention may be brought to bear to discover 
novel genes associated with cancer. Exemplars of cancer*associated genes identified by this 
method are described below. The exemplars were identified using breast cancer cell lines and 
tissue, but the strategy can be applied to any cancer type of interest. 
10 A central feature of the cancer gene screening method of this invention is to look for both 

DNA duplication and RNA overabundance relating to the same gene. This feature is particularly 
powerful in the discovery of new and potentiatly important cancer genes. While amplicons occur 
frequently in cancer, the presently available techniques mdicate only the broad chromosomal 
region involved In the duplication event, not the specific genes involved. The present invention 
15 provides a way of detecting genes that may be present in an amplicon from a functional basis. 
Because an early part of the method involves detecting RNA, the method avoids genes that may 
be duplicated in an amplicon but are quiescent (and therafbre InBlevant) in the cancer cells. 
Furthermore, it recruits active genes from a duplicated region of the chromosome too small to be 
detectable by the techniques used to describe amplicons. 
20 Near the heart of this approach are several concepts. One is that genes encoding 

products implicated positively in the malignant process achieve elevated gene expression as a part 
of malignant transformation. In this context, "gene expression" refers to expression at the RNA 
transcription level. Most typically, the RNA is in tum be translated into a protein with a particular 
enzymatic, binding, or regulatory activity which increases after malignant transformation. In a less 
25 common example, the RNA may encode or participate as a ribozyme. antisense polynucleotide, or 
other functional nucleic acid molecule during malignancy. In a third example, RNA expression may 
be incidental but symptomatic of an important event in transformation. 

Another concept is that overexpresslon, if central to malignant transformation, may be 
achieved in different tumors by different mechanisms, and that at least one such possible 
30 mechanism Is gene duplication. Accordingly, a substantial proportion of transformed cells will have 
an amplicon. or duplicated region of a chromosome, that includes within its compass the 
overexpressed gene. Other transformed cells may achieve RNA overabundance without gene 
duplication, such as by increasing the rate of transcription of the gene (e.g.. by upregulation of the 
promoter region), by enhancing transcript promotion or transport, or by increasing mRNA sun^ivaL 
35 Thus, the method entails screening at the RNA level, several cancer cell lines or tumors, 

and several normal ceH lines or tissue samples at the same time. RNA are selected that show a 
consistent elevation amongst the cancer cells as compared wKh nomnal cells. Additional strategies 
may be employed in combination with the RNA screening to improve the success rate of the 
meUiod. One such strategy is to use several cancer cell lines ttiat are all Icnown to have duplicated 
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genes in the same regton of a particular chromosome. Thus, the RNA that emerge from the screen 
are more likely to represent a de!it)erate overexpression event and the overexpressed gene is 
likely to t>e within the duplicated regfon. A supplemental strategy is to use freshly prepared tissue 
samples rather than cell lines as controls for base-iine expression. This avoids sele^on of genes 
5 that may alter their expresston level just as a result of tissue culturing. Another supplemental 
strategy is to conduct an additional level of screening, following identification of shared, 
overexpressed RNA. The selected RNA are used to screen DNA from suitable cancer cells and 
normal cells, to ensure that at least a proportion of the cells eK4)ieved the overexpresston by way of 
gene duplication. 

10 The strategy for detecting such genes comprises a rtumber of innovatiorts over thoseihat 

have been used in prevbus work. 

The first part of the method is based on a search for particular RNAs that are overabundant 
in cancer cells. A first innovatk)n of the method is to compare RNA abundance between control 
cells and several (Snemnt cancer cetts or cancer cell lines of the desired type. The cDNA 

15 fragments that emerge in a greater amount in several different cancer lines, but not In control cells, 
are more likely to reflect genes that are important in disease progression, rather than those that 
have undergone secondary or coincMental activatton. It is partknilarty preferred to use cancer cells 
that are known to share a common dupKcated chromosomal regton. 

A second innovation of this method is to supply as control, not RNA from a cell line or 

20 culture, but from fresh ^ssue samples of non-mallgnant origin. There are two reasons for this. 
First, the tissue will provWe the spectrum of expresston that is typical to the normal cell phenotype. 
rather than Indivkiual differences that may become more prominent in culture. This establishes a 
more reliable baseline for normal expresston levels. More importantly, the tissue will be devoid of 
the effects that in vitro culturing may have in altering or selecting particular phenotypes. For 

25 example, prpto-oncogenes or growth factors may become up-regulated in culture. When cultured 
cells are used as the control for differential display, these up-regulated genes would be missed. 

A third innovatton of this method is to undertake a subselectton for cDNA corresponding to 
genes tliat achieve their RNA overabundance in a substantial proportion of cancer cells by gene 
duplication. To accomplish this, appropriate cDNA corresponding to overabundant RNA Uentlfied 

30 in the foregoing steps are used to probe digests of cellular DNA from a panel of diffiefent cancer 
cells, and from normal genomic DNA. cDNA that shov\^ evkJence of higher copy numbers in a 
proportion of the panel are selected for further characterization. An addittonal advantage of this 
step is that cDNA corresponding to nutochondrial genes can rapidly be screened away by including 
a mitochondrial DNA digest as an additional sample for testing the probe. This eliminates most of 

35 the folse-positive cDNA, whtoh othmvise make up a majority of the cDNA identified. I 

Thus, the Mentificatton of genes yiekling prodi^ that are present at abnormal levels is 
accomplished by a method comprised of the following s>teps. 

To Identify parttoutar RNA that Is overabundant in cancer cells, RNA is prepared fi^m bothl 
cancerous and control ceils by standard techniques. Cancer-assocteited genes may affect cellular^ 
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metabolism by any one of a number of mechanisms, for example, they may encode ribozymes. \ 
anthsehse polynucleotides. DMA-binding polynucleotides, altered ribosomal RNA, and the tike. 
The gene screening methods of this invention may employ a comparison of RNA abundance levels 
at the total RNA level, not strictly limited to mRNA. However, the vast majority of cancer- 
5 associated genes are predicted to encode a protein gene whose up-reguiation is dosety linked to 
the metabolic process. For example, the four exemplary breast cancer genes described elsewhere 
in this appiksation all comprise an open reading frame. Accordingly, a focus on mRNA enriches the 
selectable pool Ibr candidate cancer-associated genes. Focus towards mRNA can be conducted 
at any step in the method. It is particularly convenient to use a display method that displays cONA 
10 copied only from mRNA. In this case, whole RNA may be prepared and analyzed from cancer and 
control cell populations without separating out mRNA. 

In terms of the cancer cells used as an RNA source, it is particularly advantageous to use 
a plurality of cancer cells known to contain a duplicated gene or chromosomal segment In the same 
regton of the chromosome. The duplicated segment need not be the same size in all tlie ceils, nor 
15 is it necessary that the number of duplicatkms be the same, so tong as there is at least sonne part 
of the duplicated segment that is shared amongst alt the cancer cells used in the screen. Thus, a 
minimum of two, and preferably at least three cancer cells are used that are sufficiently 
characterized to identify a shared duplicated regk>n, and can be used as a source of RNA for the 
screening test In contrast, the control cell population will not comprise chronrK>somal duplicatk>ns. 
20 Assuming the duplication to be related to the malignancy of the cancer cells, RNA 

transcribed from the duplicated region is expected to be overabundant compared with ihaX of the 
control cell. Accordingly, a highly effective strategy is to klentify overabundant RNA that is present 
in att (or at toast several) of the cancer cell preparattons. but none of the control preparations. By 
using cancer ceils that share a duplicated chromosomal regton. the RNA comparison will be 
25 strongly biased in favor of RNA overabundance transcribed from the shared dupltoated regton. 
Since the shared region is optimalli^ only a small segment of a single chromosome, e)q3ression 
differences arising from elsewhere in the genome in one cancer cell or another win not be selected. 
We have found that this is highly effective in eliminating: a) RNA abundance differences resulting 
from normal metaboGc variattons between cells; and/or b) RNA abundance differences related to 
30 cancer cell malignancy, but occurring secondarily to malignant transfomnatran. This is important, 
because it considerably minimizes the chief deficiency in the use of RNA comparison methods, 
particularly differential display, for the screening of potential cancer genes: namely, the onerous 
number of false-positives that such techniques generate. 

Shared duplicated regions in cancer cells may be Mentmed by a relevant analytical 
35 technique, or by reference to such analysis already conducted and published. One approach that 
has been highly effective in mapping approximate sub-chromoson^ tocations of duplicated 
segments is comparative genomte hybridization (C6H). This technk^ue involves extracting, 
amplifying and labeling DNA from the subject cell; hybridizing to reference metaphase 
chromosomes treated to remove repetitive sequences; and ot)senring the position of the hybridized 
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DMA on the chromosomes OA/O 93/18186; Gray et al.). The greater the signal intensity at a given 
position, the greater the copy numt)er of the sequences in the subject cell. Thus, regions showing 
elevated staining correspond to genes duplicated in the cancer cells, while regions showing 
diminished staining correspond to genes deleted in the cancer cells. Related techniques which a 
5 practitioner in the art will be well aware are methods for preparing and using repeat sequence 
chromosome-specific nucleic acid prot>es (US 5,427,932: Weier et aL), methods for staining target 
chromosomal DNA using labeled nucleic acid fragments in conjunction with blocking fragments 
complementary to repetitive DNA segments (US 5.447,841; Gray et al.), and methods for detecting 
amplified or deleted chromosomal regions using a mapped library of labeled polynucleotide probes 

10 (US 5.472.842; Stokke et al.). If desired, multiple fiuorochromes can be used as iat>eling agents 
with CGH and related techniques, to provide a three^lor visualization of deleted, normal, and 
duplicated chromosome abnormalities (Lucas et al.). 

The choice of a particular chronriosonnal mapping approach is irrelevant especially once 
knowledge of the duplicated region is known. If the iocatk>n of the chromosome duplication is 

15 already established for a cell line to be used in RNA comparison during the course of the present 
invention, then it is unnecessary to conduct a mapping technique de novo. For example, 
established cancer cell lines exist for which mapping data is already available in the public domain. 
Provided in the reference section of this application is a list of over 40 articles in which the 
locations of duplicated regions in particular cancer cells are described. In the context of the 

20 present invention, a plurality of cancer cells is chosen for the screening panel based on such data, 
so that they share a duplicated chromosomal region. The chromosomal bcation of a suspected 
duplicatbn may be confirmed by hybridizatkm analysis, if desired, using a probe specific for the 
location. 

The cancer cells used for RNA comparison are also generally (but not necessarily) derived 
25 from the same type of cancer or the same tissue. Using cells derived firom tiie same type of cancer 
increases the probability that the gene ultimately Mentified will be common in that type of cancer, 
and suitable as a type-specific diagnostic marker. Using cells derived from different types of 
cancer is in effect a search for cancer-related genes that are less tissue specific and more related 
to the malignant process in general. Both types of genes are of interest for both diagnostic and 
30 therapeutic purposes. In one illustration highlighted in Example 1. RNA was screened from the 
three breast cancer cell lines BT474. SKBR3. and MCF7. which have been determined by CGH or 
Southern analysis to share a duplicated genetic regions in chromosomes 1, 8. 14, 17, and 20. 
When the RNA from these ceils was displayed, a number of RNA were found to be overabundant 
in the cancer cells, but not controls (Figure 1). Three RNA overabundant in all three cancer ceU 
35 Ones corresponded to cancer-associated genes located on chromosonnes 1. 8, and 14 that are 
listed in Table 1. The chronx>some 13 gene (CH13-2a12-1) was overexpressed in 2 of the 3 cell 
lines; namely BT474 and SKBR3. Southern analysis subsequentiy established that the 
chromosome 13 gene was duplicated in the same two cell lines (Example 6. Table 5). 
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Selection of the source or sources of control cell RNA Is also a matter of sonne refinement 
The control RNA can be derived from In vitro cultures of non-malignant ceils, or established cell 
lines derived from a non-malignant source. However, it is preferable for the control RNA to be 
obtained directly from normal human tissue of the same type as the cancer ceils. This is because 
5 most normal cells do not proliferate indefinitely; hence adaptation of a cell into a cell line involves a 
degree of transformation. The transforming event may. in turn, t>e shared with that of certain 
cancer cells, at least at the level of RNA abundance. Hence, comparison of the RNA levels in 
cancer cells with so-called control cell tines may lead the practitioner to miss genes that are related 
to malignancy. For convenience, control ceils may be maintained in culture for a brief period 

10 before the experiment, and even stimulated; however, multiple rounds of cell division are to be 
avoided if possible. Use of both stimulated and unstimulated ceils as controls may help provide 
RNA patterns corresponding to the normal range of abundance within various metabolic events of 
the cell cyde. In one illustration highlighted In Example 1, RNA was screened using both 
proliferating and non-proliferating celts. As stated, the screening of breast cancer RNA is 

15 preferably conducted using uncultured normal mammary epitheltal cells (termed ''onganoids*') as 
sources of control RNA. These cells may be obtained from surgical samples resected from healthy 
breast tissue. 

The RNA is preserved until use in the comparison experiment in such a way to minimize 
fragmentation. To fadfitate confirmation experiments, it is useful to use RNA of a reproducible 

20 character. For this reason, it is convenient to use RNA that has been obtained from stable 
cancerous ceil lines and/or ready tissue sources, although reproducibility can also be provided by 
preparing enough RNA so that it can be preserved in aliquots. 

For displaying relative overabundance of RNA in the cancer cells, compared with the 
control cells, many standard techniques are suitable. These would indude any fbnm of subtractive 

25 hybridization or comparative analysis. Preferred are techniques in which more than two RNA 
sources are compared at the same time, such as various types of artntrarily primed PGR 
fingerprinting techniques (Welsh et al., YoshDcawa et al.). Particularty preferred are differential 
mRNA display methods and variations thereof, in which the samples are run in neighboring lanes in 
a separating gel. These techniques are focused towards mRNA by using primers that are specific 

30 for the poly-A tail characteristic of mRNA (Liang et al., 1992a; U.S. Patent 5,262,311). 

Because many thousands of genes are expressed in the cells of higher organisms at any 
one time, it is preferable to improve the legibility of the display by surveying only a subset of the 
RNA at a time. Methods for accomplishing this are known in the art A preferred method is by 
using seiecUve primers that initiate PGR replication for a subset of the RNA. Thus, the RNA is first 

35 reverse transcribed by standard tediniques. Short primers are used for the selection, preferably 
chosen such that alternative primers used In a series of like assays can complete a comprehensive 
survey of the mRNA. 

In a prefenred exampte. primers can be used for the 3' region of the mRNAs which have an 
ollgo-dT sequence, followed by two other nucleotides (TiNK/l, where i » 11, N e {A.C,G}, and M e 
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{A.C.G,T}). Thus. 12 possible primers are fKiuired to complete the survey. A random or arbitrary 
primer of minimal length can then be used for repiication towards what conesponds in the 
sequence to the 5' region of the mRNA. The optimal length for the random primer is about 10 
nucleotides. The product of the PGR reaction is labeled with a radioisotope, such as ^S. . The 
5 labeled cDNA is then separated by molecular weight, such as on a polyacrylamide sequencing gel. 

If desired, variations on the differential display technique may be employed. For example, 
one-base ollgo-dT primers may be used (Liang et al., 1993 & 1994), although this is generally less 
prefenBd because the display pattern is conespondingly more complex. Selection of primers may 
be optimized mathematically depending on the number of RNA species in a tissue of interest 

10 (Bauer et al.). The metiiod may be adapted for non-denaturing gels, and for use with automatic 
DMA sequencers (Bauer et al.). Altemative radioisotopes (Trentmann et al.) or fluorochromes (Sun 
et al.) may be used for labeling the dtfFerential display. CMerential display may optionally be 
combined with a ribonuclease protection assay (Yeatman et al.). PCR primers may optionally 
incorporate a restriction site to fodlitate cloning (Linskens et af., Ayala et aL). Using Tag 

15 polymerase from multiple nnanufeicturers can increase the amount of variation under otherwise 
identical conditions (Haag et al ). Nested PCR primers may be used in differential display to 
decrease background created by oligo-dT primers (WO 95/33760). Ottier variants of the 
differential display technique are known in the art and described inter alia in the references cited in 
this discfosure. The use of such modifications are within the scope of the present Invention, but are 

20 not required, as evidenced by the examples described below. 

Based on the comparison of relative abundance of RNA. particular RNAs are chosen which 
are present as a higher proportion of the RNA in cancerous celis, ^mpared with control cells. 
When using the differential display metiiod, ttie cDNA corresponding to overabundant RNA will 
produce a band wiUi greater proportional intensity amongst neighboring cONA bands, compared 

25 with the proportional intensity in ttie control lanes. Desired cDNAs can be recovered most directly 
by cutting the spot in ttie gel corresponding to the band, and recovering the DNAs ttierefrom. 
Recovered cDNA can be replicated again for further use by any technique or combination of 
techniques known in the art. including PCR and cloning into a suitable carrier. 

An optional tHJt highly beneftoial additional screening step, typically performed 

30 subsequentiy to an RNA comparison as described al>ove. is aimed at identifying genes that are 
duplicated in a substantial proportion of cancers. This is conducted by using cONA such as 
selected from differential display to probe digests of chromosomal DNA obtained from twvo or more 
cancerous cells, such as cancer cell lines. Chromosomal DNA from non-cancerous cells that 
essentially reflects the germ line in tomns of gene copy number is used for ttie control. A preferred 

35 source of control DNA in experiments for human cancer genes is placental DNA, whtoh Is readily 
obtainable. The DNA samples are cleaved at sequence-specifto sites along ttie chromosome, most 
usually with a suitable restriction enzyme into fragments of appropriate size. The DNA can be 
bfotted direcHy onto a sulteble medium, or separated on an agarose gel before blotting. The latter 
method is preferred, because it enables a comparison of the hybridizing chromosomal restriction 
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fragment to determine whether the probe is binding to the same fragment in ail samples. The 
amount of probe binding to DNA digests from each of the cancer ceBs is compared with the amount 
binding to control DNA. 

Because the comparison is quantitative, it is preferable to standardize the measurement 
5 intemally. One method Is to administer a second probe to the same blot probing for a second 
chromosomal gene unlikely to be duplicated in the cancer cells. This method is preferred, because 
it standardizes not only for differences in the amount of DNA provided, but also for differences in 
the amount transfenred during blotting. This can be accomplished by using alternative labels for 
the two probes, or by stripping the first probe with a suitable eluant before administering the 
10 second. 

To eliminate cDNA for mitochondrial genes, tt is preferable to include in a parallel analysis 
a mitochondrial DNA preparation digested with the same restriction enzyme. Any cDNA probe that 
hyt>rKlizes to the appropriate mitochondrial restriction fragments can be suspected of 
corresponding to a mitochondrial gene. 

15 In the initial replication of the RNA. the random primer may bind at any location along the 

RNA sequence. Thus, the copied and replicated segment may be a fragment of the full-length 
RNA. Longer cDI^ corresponding to a greater portion of the sequence can be obtained. If 
desired, by several techniques known to practitioners of ordinary skill. These include using the 
cDNA fragment to isolate the corresponding RNA, or to isolate complementary DNA from a cDNA 

20 library of the same species. Preferably, the library is derived from the same tissue source, and 
more preferably from a cancer cell line of the same type. For example, for cDNA corresponding to 
human breast cancer genes, a preferred library is derived from breast cancer cell line BT474. 
constructed in tamtxia GT10. 

Sequences of the cDNA can be determined k)y standard techniques, or by submitting the 

25 sample to commercial sequencing services. The chromosomal tocations of the genes can be 
determined by any one of several methods known In the art, such as in situ hybrMizatbn using 
chromosomal smears, or panels of somatic cell hybrids of known chromosomal composition. 

The cDNA obtained through the seiectton process outlined can then be tested against a 
larger panel of cancer ceil lines and/or fresh tumor cells to determine what proportion of the cells 

30 have duplk:ated the gene. This can be accomplished by using the cDNA as a prot>e for 
chromosomal DNA digests, as descrit)ed eariier. As illustrated in the Example section, a preferred 
method for conducting this determination is Southern analysis. 

The cDNA can also be used to determine what proportion of the cells have RNA 
overabundance. This can be accomplished by standard techniques, such as sfot blots or btots of 

35 agarose gels, using whole RNA or messenger RNA from each of the cells in the panel. The blots 
are then prot>ed with the cDNA using standard technhiues. It is preferatrie to provide an internal 
foading and blotting control for this analysis. A prefened method is to re-probe the same blot for 
transcripts of a gene likely to be present in about the same level In all cells of the same type, such 
as the gene for a cytoskeietal protein. Thus, a preferred second probe is the cDNA for beta-actin. 
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Using a novel cDNA found by this selection procedure, it is anticipated that essentially all 
cancer cells showing gene duplication will also show RNA overabundance, but that some will show 
RNA overabundance without gene duplication. 

The practitioner will readily appreciate that the strategies for identilying genes that are 
5 duplicated and/or associated with RNA overabundance may be reversed appropriately to screen 
for genes that are deleted and/or associated with RNA underabundance. The principles are 
essentially the same. Genes that are frequently down-regulated in cancer (such as tumor 
suppresser genes) may be down-regulated by different mechanisms in different cells, and a gene 
with this behavior is more likely to be central to malignant transformation or persistence of the 
10 malignant state. 

To screen for such down-regulated genes accoiiding to the present invention, RNA is 
prepared from a plurality of tumors or cancer cell lines and the abundance is compared with RNA 
preparation from control cells. Again, it is highly preferable to use cancer cells that share a deleted 
gene In the same chromosomal region, in order to focus any differences at the RNA levef towaixfs 

15 particular alterations in cancer cells and away from nonmal variations or coinddental changes. The 
CGH technique may be used to identify deletions in previously uncharacterized cancer cells. As 
before, cancer cells may be chosen on the basis of previous knowledge of deleted regions; there is 
no need to conduct methods such as CGH on previously characterized lines. cDNA from the RNA 
of cancer cells is displayed (prefecably by differential display) alongside cDNA copied from 

20 (preferably uncultured) control cells, and cDNA is selected that appears to be underrepresented in 
at least two (preferably more) of the cancer cells compared with the control cells. cDNA thus 
selected may optionally be further screened against digested DNA preparations, to confirm that the 
RNA underabundance observed In the cancer cell populations is attributable in at ieast a proportion 
of the cells to an actual gene deletion. 

25 As before, the cONA may be used for sequencing or rescuing additional polynucleotides, in 

this case not from the cancer cells but from cells containing or expressing the gene at nonnal 
levels. Pharmaceuticals based on deleted genes or those associated with underexpressed RNA 
are typically oriented at restoring or upregulating the gene, or a functional equivafent of the 
encoded gene product 

30 

Th0 identification of four Bxempiary cancer associatBd genes 

To identify particular RNA that is overabundant in cancer cells, RNA has been compared 
between breast cancer ceils and control cells. The amount of totel cellular RNA was compared using 
35 a nrxxiified differential display method. Primers were used for the 3' region of the mRNAs which have 
an oligo<rr sequence, followed by two other nucleotides as described in the previous section. 
Random or arbitrary primers of about 10 nucleotMes were used for replication towards what 
corresponds in the sequence to the 5' region of ttie mRNA. The labeted amplification product was 
then separated by molecular weight on a polyacrylamide sequencing gel. 
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Particular mRNAs were chosen that were present in a higher proportion of the RNA In 
cancerous cells, compared with control cells, according to the prbportlonal intensify amongst 
neighboring cDNA bands. The cDNA was recovered directly from the gel and amplified to prowde a 
probe for screening. Candidate polynucleotides were screened by a number of criteria, including both 
5 Northern and Southern analysts to detemrilne If the conresponding genes were duplicated or 
responsible for to RNA overabundance In breast cancer cells. Sequence data of the polynucleotides 
was obtained and compared with sequences in GenBank. Novel polynucleotides with the desired 
expression patterns were used to probe for longer cDNA Inserts in a XgtIO library constructed from 
the breast cancer cell line BT474, which were Uien sequenced. 
10 Further description of the actual experimental events that occuned during identification of the 

four exemplary genes, and sequence data for CH1-9a11-2. CH8-2a13-1. CH13-2a12-1, and CH14- 
2a16-1 are provided In the Example section. 

Pg^Bparadon of polynucleotides, polypepUdes andenffbodies 

15 

Polynucleotides based on the cDNA of CH1-9a11-2, CHMalS-l. CH13-2a12-1, CH14- 
2a16*1» can be rescued firom cloned ptesmids and phage provided as part of this invention. They 
may also be obtained from breast cancer celt libraries or mRNA preparations, or from normal human 
tissues such as placenta, by judicious use of primers or probes based on the sequence data provided 

20 herein. Altematively. the sequence data provided herein can be used in chemical synthesis to 
produce a polynucleotide with an identical sequence, or that incorporates occasional variations. 

Polypeptides encoded tf the corresponding mRNA can be prepared by several different 
methods, all of which will be known to a practitioner of ordinary skill. For example, the appropriate 
strand of the fiilHength cDNA can be operably Bnked to a suitable promoter, and transfected into a 

25 suitable host cell. The host cell is then cultured under conditions that aHow transcription and 
translation to occur, and the polypeptide is subsequenUy recovered. Anottw convenient meUiod is to 
determine the polynucleotide sequence of the cDNA. and predict the polypeptide sequence according 
to the genetic code. A polypeptide can tfien be prepared directty, for example, tyy chemical synthe^s, 
eitiier identical to ttie predicted sequence, or incorporating occasional variations. 

30 Antibodies against polypeptides of tills invention may k>e prepared by any method known in 

ttie art For stimulating antibody production in an animal, it Is often preferable to enhance the 
immunogenrcity of a polypeptide by such techniques as polymerization with glutaraldehyde. or 
comt>ining with an adjuvant, such as Freund's adjuvant The inrununogen is injected into a suitable 
experimental aninnal: preferably a rodent for the preparation of monoctonal antibodies; preferably a 

35 larger animal such as a rabbit or sheep for preparation of polyctonal antibodies. It Is preferable to 
provMe a second or booster mjectton after about 4 weeks, and begin harvesting tiie antibody source 
no less than about 1 week later. 

Sera hanrested from the immunized animals provMe a source of polyclonal antibodies. 
Detailed procedures for purifying specific antibody activity from a source material are known within the 



-32- 



wo 97/38085 



PCT/US97/05930 



art. Unwanted, activity cross-reacting with other antigens, if present, can be renrioved. for example, by 
running the preparation over adsorbents made oT those antigens attached to a solid phase, and 
collecting the unbound fraction. If desired, the specific antibody activity can be further purified by such 
techniques as protein A chromatography, ammonium sulfate precipitation. Ion exchange 
5 chronnatography. high-performance liquid chromatography and invnunoafRnity chromatography on a 
column of the Immunizing polypeptide coupled to a solid support 

Altematively. immune cells such as sptenocytes can be recovered from the immunized 
animals and used to prepare a monoclonal antibody-producing cell line. See. for example. Harrow & 
Lane (1988), U.S. Patent Nos. 4.491.632 (J.R. Wfeinds et al.). U.S. 4.472.500 (C. Milstein et al.), and 
10 U.S. 4.444.887 (M.K. Hoffman et al.) 

Briefly, an antibody-producing line can be produced inter alia by cell fusion, or by transfecting 
antibody-produdng cells with Epstein Ban- Vims, or transforming with oncogenic DMA: The treated 
cells are cloned and cultured, and cfones are selected that produce antibody of the desired spedfictty. 
Specificity testing can be performed on culture supematants by a number of techniques, such as 
15 using the immunizing polypeptide as the detecting reagent in a standard invnunoassay, or using ceOs 
expressing the polypeptide In immunohistochemistry. A supply of monodonai antibody from the 
selected clones can be purified from a large volume of tissue culture supernatant or from the ascites 
fluid of suitably prepared host animals injected with the done. 

Effective variations of this method indude those In which the immunization with the 
20 polypeptide is perfomied on isolated cells. Antibody fragments and other derivatives can be prepared 
by methods of standard protein chemistry, such as sut)jecting the antibody to cleavage with a 
proteolytic enzyme. Genetically engineered variants of the antibody can be produced by obtaining a 
polynucleotide encoding the antibody, and applying the general methods of molecular btology to 
Introduce mutations and translate the variant 

25 

Use In diagnosis 

Novel cDNA sequences corresponding to genes assodated with cancer are potentially useful 
as diagnostic aids. Similarly, polypeptides encoded by such genes, and antibodies spedfic for these 

30 polypeptides, are also potentiaify useful as diagnostic aids. 

More specifically, gene duplication or overabundance of RNA in particular cells can help 
identify those cells as being cancerous, and thereby play a part in the initial diagnosis. Increased 
levels of RNA corresponding to CH1.9a11-2, CH8-2a13-12, CH13-2a12-1, and CH14-2a16-1 are 
present in a substantial proportion of bresst cancer cell tines and primary breast tumors. In addition. 

36 preliminary Northern analysis usvig probes for CH8-2a13-12, CH13.2a12-1, and CH14-2a16-1 
Indicates that these genes may be duplicated or be associated wWi RNA overabundance in certain 
cell lines derived from cancers other than breast cancer, Induding colon cancer, lung cancer, 
prostrate cancer, glioma, and ovarian cancer. 
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For patients already diagnosed with cancer, gene duplication or overabundance of RNA can 
assist with clinical management and prognosis. For example, overatnindance of RNA may be a 
useful predictor of disease survival, metastasis, susceptibility to various regimens of standard 
chemotherapy, the stage of the cancer, or its aggressiveness. See generally the article by Blast, U.S. 
5 Patent No. 4.968.603 (Slamon et al.) and PCT Application WO 94/00601 (Levine et al.). All of these 
determinations are important in helping the clinician choose between the avaiteible treatment options. 

A particularly important diagnostic application contemplated in this invention is the 
Identification of patients suitable for gene-specific therapy, as outlined in the following section. For 
example, treatment directed against a particular gene or gene product Is appropriate in cancers where 

10 the gene is duplicated or there is RNA overabundance. Given a particular pharmaceutical that is 
directed at a particular gene, a diagnostic test specific for the same gene is Important in selecting 
patients likely to benefit from the pharmaceutical. Given a selection of such phannaceuticals specific 
for different genes, diagnostic tests for each gene are Important in selecting which phanmaceutical Is 
likely to benefit a particular patient 

15 The polynudeotMe. polypeptMe. and antibodies embodied in this invention provkie specific 

reagents tfiat can be used In standard diagnostic procedures. The actual procedures for conducting 
diagnostic tests are extensively known in the art. and are routine for a practitioner of ordinary skill. 
See. for example, U.S. Patent No. 4.968.603 (Slamon et al.). and PCT /Applications WO 94/00601 
(Levine et al.) and WO 94/17414 (K. KeyomarsI et al.). VVhat follows is a brief non-limiting survey of 

20 some of the known procedures that can be applied. 

Generally, to perform a diagnostic method of this tnventk>n. one of the compositbns of this 
inventbn is provkJed as a reagent to detect a target In a clinical sample with which it reacts. Thus, the 
polynucleotMe of this inventbn can be used as a reagent to detect a DMA or RNA target such as 
might be present in a cell with duplication or RNA overabundance of the corresponding gene. The 

25 polypeptide can be used as a reagent to detect a target for which it has a spedfk: binding site, such as 
an antibody molecule or (if the pdypeptkle is a receptor) the corresponding ligand. The antibody can 
be used as a reagent to detect a target it spedficaliy recognizes, such as the polypeptide used as an 
inrvnunogen to raise it 

The target Is supplied by obtaining a suitable tissue sample from an indivkJual for whom the 
30 diagnostic parameter is to be measured. Relevant test samples are those obtained from individuals 
suspected of containing cancerous cells, particularly breast cancer cells. Many types of samples are 
suitable for this purpose, including those that are obtained near the suspected tumor site by bbpsy or 
surgical dissectton, in vitro cultures of cells derived therefinom, bkXKJ, and bkx>d components. If 
desired, the target may be partially purified from the sample or amplified before the assay is 
35 conducted. The reaction is perfonned tyy contacting the reagent with the sample under oondittons that 
will allow a complex to form between ttie reagent and the target. The reactton may be performed in 
solutton, or on a solM tissue sample, for example, using histdogy secttons. The fomnation of the 
complex is detected by a number of techniques known in the art For example, the reagent may be 
supplied with a label and unreacted reagent may be removed from the complex; the amount of 
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remaining label thereby indtoating the amount of complex formed. Further details and alternatives for 

complex detection are provided in the descriptions that follouv. 

To determine whether the amount of complex formed is representative of cancerous or non- 

canoerous cells, the assay result is compared with a similar assay conducted on a control sample. It 
5 is generally preferable to use a control sample which is from a non-cancerous source, and otherwise 

similar In conr^)osition to the clinical sample being tested. However, any control sample may be 

suitable provided the relative amount of target in the control is known or can be used for comparative 

purposes. Where the assay is being conducted on tissue sections, suitable control cells with normal 

histopathology may sunxxjnd the cancerous cells being tested. It is often preferable to conduct the 
10 assay on the test sample and the control sample simultaneously. However if the arriount of complex 

fonrn^ is quantifiable and sufficiently consistent, it is acceptable to assay the tesi sample and control 

sample on different days or In different laboratories. 

A polynucleotide embodied In this invention can be used as a reagent for determining gene 

duplication or RNA overabun(teince that nriay be present in a clinical sanr^e. The binding of the 
15 reagent polynucleotide to a target in a clinical sample generally reUes in part on a hybridization 

reaction between a region of the polynucleotide reagent, and the DNA or RNA in a sample being 

tested. 

If desired, the nucfeic acid may be extracted from the sample, and may also be partially 
purified. To measure gene duplication, the preparation is preferably enriched for chromosomal DMA; 

20 to measure RNA overabundance, the preparation is preferably enriched for RNA. The target 
polynucleotide can be optionally subjected to any combination of additional treatments, including 
d^estion with restricUon encfonudeases. size separation, for example by electrx)phoresis in agarose 
or polyacrylamide, and affixed to a reaction matrix, such as a blotting material. 

Hybridization is allowed to occur by mixing the reagent pdynudeotide with a sample 

25 suspected of containing a target polynucleotide under appropriate reaction conditions. This may be 
followed by washing or separation to remove unreacted reagent Generally, both the terget 
polynucleotide and the reagent must be at least partly equilibrated into the single-strand^ fbmi in 
order for complementary sequences to hybridize efficiently. Thus, it may be useful (particularfy in 
tests for DNA) to prepare the sampte by standard denaturation techniques known in the art 

30 The minimum complementarity t)etween ihe reagent s^uence and the target sequence for a 

complex to fonm depends on ttie conditions under which the complex-forming reaction is allowed to 
occur. Such conditions include temperature, tonic strength, time of incubation, the presence of 
additional solutes h the reaction nrtixture such as formamide, and washing procedure. Higher 
stringency conditions are those under which higher nrunimum oomplementerity is required for stebte 

35 hybridization to occur. It is generally preferabte In diagnostic appGcations to bicrease the specificity of 
the reaction, minimizing cross-reaclivity of the reagent polynudeottde alternative undeslred 
hybridization sites in the sample. Thus, it te preferable to conduct the reactton under conditions of 
high stringerny: for exampte. in the presence of high temperature, low salt, fomiamide, a combination 
of these, or followed by a low-salt wash. 
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In order to detect the complexes formed t>etween the reagent and the target, the reagent is 
generally provided with a label. Sonfie of the labels often used in this type of assay include 
radioisotopes such as and ^P. chemiluminescent or fluorescent reagents such as fluorescein, and 
enzymes such as alkaline phosphatase that are capable of producing a colored solute or precipitant 
5 The label may be intrinsic to the reagent, it may be attached by direct chemical linkage, or it may be 
connected through a series of intermediate reactive molecules, such as a biotin-avklin complex, or a 
series of inter-reactive polynucleotides. The label may be added to the reagent before hybridization 
with the target polynucleotide, or afterwards. 

To improve the sensitivity of the assay, it is often desirable to increase the signal ensuing 
10 from hybridization. This can be accomplished by replicating either the target polynucleotide or the 
reagent poiynucleotkle. such as by a polymerase chain reactkin. Aitemativety. a combination of 
serially hybridizing polynucleotides or branched poiynudeotMes can be used in such a way that 
multiple label components become incorporated Into each complex. See U.S. Patent No. 5,124.246 
(UrdeaetaL). 

15 An antibody embodied in this invention can also be used as a reagent in cancer diagnosis, or 

for detemnining gene duplicatkNi or RNA overabundance that may be present in a clink:al sample. 
This relies on the fact that overabundance of RNA in affected cells is often asswiated wiOi increased 
production of the corresponding polypeptide. Several of the genes up-regulated in cancer ceils 
encode for cell surface receptors A for example, erbB-2, o-myc and epidermal growth factor. 

20 Alternatively, the RNA may encode a protein kept inside the cell, or it may encode a protein secreted 
by the cell into the sunrounding milieu. 

Any such protein product can be detected in solid tissue samples and cultured celts by 
immunohistok)gical techniques that wiO be obvtous to a practitioner of ordinary skill. Generally, the 
tissue is presented by a oombinatk3n of technkjues whk:h may Include coofing, exchanging into 

25 different solvents, fixing with agents such as parafomDaldehyde. or embedding in a commercially 
available medium such as paraffin or OCT. A secUon of the sample is suitably prepared and overiaki 
with a primary antibody spedfic for the protein. 

The primary antibody may t>e provkied directly with a suitable label. More frequently, the 
primary antibody is detected using one of a number of developing reagents whkiih are easily produced 

30 or available commercially. Typically, these developing reagents are anti-imnnjnogtobulin or protein A, 
and they typically bear labels which include, but are not limited to: fluorescent markers such as 
fluorescein, enzymes such as peroxkiase that are capable of precipitating a suitable chemical 
compound, electron dense markers such as cdbldal goki. or ratfiolsotopes such as ^^1. The section 
is then visualized using an appropriate mk:roscopic technkiue, and the level of labeling is compared 

35 beteveen the suspected cancer cell and a control celt, such as cells surrounding the tumor area or 
those taken from an attematlve site. 

The amount of protein corresponding to the cancer-associated gene may be detected in a 
standard quantitative inrununoassay. If the protein is secreted or shed from the cell in any appreciable 
amount, it may be detectable In plasma or serum samples. Alternatively, the target protein may be 
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solubjilzed or extracted from a solid tissue sample. Before quantitating. the protein may optionally be 
affixed to a solid phase, such as by a blot tedintque or using a capture antibody. 

A number of immunoassay nwthods are established in the art for performing the quantitation. 
For example, the protein may be mixed with a pre<letennined non-limiting amount of the reagent 
antibody specific for the protein. The reagent antibody may contain a directly attached label, such as 
an enzyme or a radioisotope, or a second labeled reagent may be added, such as 
antismmunogtobuBn or protein For a solid-phase assay, unreacted reagents are removed by 
washing. For a liquid-phase assay, unreacted reagents are removed by some other separation 
technique, such as filtiation or chromatography. The amount of label captured in the complex is 
positively related to the amount of target protein present in the test sample. A variation of this 
technique is a competitive assay, in which the taiget protein competes with a labeled analog for 
binding sites on the specific antibody. In this case, the anrxMint of label captured is negativeV related 
to the amount of target protein present in a test sanv>le. Results obtained using any such assay on a 
sample from a suspected cancer-bearing source are compared with those from a noncancerous 
15 source. 

A polypeptide embodied in this invention can also be used as a reagent in cancer diagnosis, 
or for detemiining gene duplication or RNA overabundance that may be present in a clinical sample. 
Overabundance of RNA In affected cells may result in the conesponding polypeptide being produced 
by the cells in an abnonnal amount On occasion, overabundance of RNA may occur concunently 
20 with expression of the polypeptide in an unusual form. This In turn may result in stimulatfon of the 
immune response of the host to produce its own antibody nnolecules that are specific for the 
polypeptide. Thus, a number of human hybrMomas have been raised lh>m cancer patients that 
produce antibodies against their own tumor antigens. 

To use the polypeptide in the detection of such antibodies in a sutject suspected of having 
25 cancer, an immunoassay is conducted. Suitable methods are generally the same as the 
immunoassays outlined in the preceding paragraphs, except that the polypeptide is provided as a 
reagent, and the antibody is the target In the clinical sample which is to be quantified. For example, 
human IgG antibody molecules present in a serum sample may be captured virith solid-phase protein 
A. and then overlaid with the labeled polypeptide reagent Tbe amount of antibody would then be 
proportional to the label attached to the solid phase. Alternatively. ceUs or tissue sections expressing 
the polypeptide may be overlaid first with the test sample containing the antibody, and then with a 
detecting reagent such as labeled anti-immunoglobulin. The amount of antibody would then be 
proportional to the label attached to the cells. The amount of antibody detected in the sample from a 
suspected cancerous source would be compared with the amount detected in a control sample. 

These diagnostic procedures may be perfbmried by diagnostic laboratories, experimental 
laboratories, practitioners, or private individuals. This invention provides diagnostic lata which can be 
used in these settings. The presence of cancer cells in the Individual may be manifest in a clinical 
sample obtained from that individual as an alteration In the DNA. RNA. protein, or antibodies 
contained in the sample. An alteratfon in one of these componenta resulting from the presence of 
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cancer may take the form of an increase or decrease of the level of the component or an alteration in 
the fonri of the component, compared with that in a sample from a healthy individual. The dinlcat 
sample is optionally pre-treated for enrichment of the target being tested for. The user then applies a 
reagent contained in the kit in order to detect the changed level or alteration in the diagnostto 
5 component 

Each kit necessarily comprises the reagent which renders the procedure specific: a reagent 
polynucleotide, used for detecting target DNA or RNA; a reagent antitx)dy, used for detecting target 
protein; or a reagent polypeptkie. used for detecting target antibody that may be present in a sample 
to t>e analyzed. The reagent is supplied in a solid form or liquki buffer that is suitable for Inventory 
10 storage, and later for exchange or addition into the reaction medium when the test is perfomned. 
Suitable packaging is provided. The kit may optbnally provkle additional components that are useful 
in the procedure. These opttonat components include buffers, capture reagents, devetoping reagents, 
labels, reacting surfaces, means for detectbn, control samples, instructions, and interpretive 
information. 

15 

UsBlnphamaceuUcaldweloiHnMt 

Emlx)died in this invention are modes of treating subjects t>eanng cancer cells that have 
overabundance of the particular RNA described. The strategy used to obtein the cDNAs provided in 

20 this invention was deliberately focused on genes that achieve RNA overabundance by gene 
duplk^ation in some cells, and by altemative mechanisms in oth^ ceils. These alternative 
mechanisms may include, for example, translocation or enhancement of transcription enhancing 
elements near the coding region of the gene, delelfon of repressor binding sites, or altered production 
of gene reguialors. Such mechanisms would result in more RNA being transcribed from the same 

25 gene. AHemaUvely, the same amount of RNA may be transcribed, but may persist longer in the cell, 
resulting in greater abundance. This could occur, for exampte. tiy reduction in the level of nlxuymes 
or protein enzymes that degrade RNA or in the modifk^ation of the RNA to render it more resistent to 
such enzymes or sponteneous degradation. 

Thus, different cells make use of at least two different mechanisnos to achieve a single result 

30 A the overabundance of a partknilar RNA. This suggests that RfMA overabundance of these genes is 
central to the cancer process in the affected cells. Interfering with the specific gene or gene product 
wouM consequently modiiy the cancer process. It is an objective of this invention to provide 
phamnaceutical compositions that enable therapy of this kind. 

One way this Inventfon achieves this otsjecBve is through screening candidate drugs. The 

35 general screening strategy is to apply the candidate to a manlftetation of a gene associated with 
cancer, and then detennine whether the effect is beneltoial and specific. For example, a composttton 
that interferes with a polynudeolMe or polypeptkie conresponding any of tte novel cancer-^issociated 
genes described herein has the potential to btock the associated pathotogy when administered to a 
tunnor of the appropriate phenotype. It is not nec^sary that the mechanism of Interference be known; 
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only that the interference be preferential for cancerous cells (or cells near the cancer site) but not 
other oeJIs. 

A prefenBd method of screening te to provide cells in which a polynucleotide related to a 
cancer gene has been transfected. See, for example. PCX application \NO 93/08701. A practitioner 
5 of ordinary skill will be well acquainted with techniques for transfecting eukaiyotic cells, including the 
preparation of a suitable vector, such as a viral vector; conveying the vector into the cell, such as by 
electroporation; and selecting cells that have been transformed, such as by using a reporter or drug 
sensitivity element 

A cell line chosen which has a phenotype desirable in testing, and which can be maintained 
10 well in culture. The cell line is transfected with a polynucleotide corresponding to one of the 
cancer-associated genes identified herein. Transfection is performed such that the polynucleotide is 
operably linked to a genetic controlling element that pemnits the correct strand of the polynucleotide to 
be transcribed within the cell. Successful transfection can be determined by the increased abundance 
of the RNA compared with an untransfeded cell. It is not necessary that the cell prevk)usly be devokl 
IS of the RNA, cHily that the transfectton result in a substantial increase in the level obsen/ed. RNA 
abundance in the cell is measured using the same polynucleotide, according to the hyt>ridization 
assays outlined earlier. 

Drug screening is performed by adding each candidate to a sample of transfected cells, and 
monitoring the effect The experiment includes a parallel sample which does not receive the 

20 candklate drug. The treated and untreated cells are then compared by any suitable phenotypic 
criteria, including but not limited to nnlcroscopic smalysis, viability testing, ability to replicate, 
histologk^t examination, the level of a partk»lar RNA or pdypeptMe associated with the cells, the 
level of enzymatic SKitivify expressed by the cells or oeli lysates. and the atMlity of the ceHs to interact 
with other cells or compounds. Differences between treated and untreated cells indicates effects 

25 attributeble to the candidate. In a preferred method, the effect of the drug on the cell transfected with 
the polynucleotide is also conpared with the effect on a control cell. Suitable control cells Include 
untransfected cells of similar ancestry, cells transfected with an alternative polynucleotide, or celts 
transfected with the same polynucleotide in an inoperative fashion. Optimally, the drug has a greater 
effect on operably transfected ceils than on control cells. 

30 Desirable effecte of a candidate drug include an effect on any phenotype that was confen^d 

by transfectton of the cell line with the polynucleotide from the cancer-associated gene, or an effect 
that coukl limit a pathok)gk»l feature of the gene in a cancerous cell. Examples of the first type would 
be a drug that HrnKs the overabundance of RNA in the transfected cell, limits productton of the 
encoded protein, or Itmite the fiincttonal effiect of the protein. The effect of the drug wouki be apparent 

35 when comparing resuKs between treated and untreated cells. An exanr^ie of the second type wouM 
be a drug that wakes use of the transfBcted gene or a gene product to specifically poison the cell. 
The effect of the drug would be apparent when comparing results between operably transfected cells 
and control cells. 
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UseintrmttmBnt 

This invention also provides gene-spedfic pharmaceuticals in which each of the 
polynucleotides, polypeptides, and antibodies embodied herein as a specific active Ingredient in 
5 pharmaceutical compositions. Such compositbns may decrease the pathology of cancer cells on 
their own, or render the cancer cells more susceptible to treatment by the norhspecific agents, such 
as classical chemotherapy or radiation. 

An example of how polynucleotides emt)odied in this invention can be effectively used in 
treatment is gene therapy. See, for example. Morgan et aL, Culver et ai., and U.S. Patent No. 

10 5,399,346 (French et al.). The general principle is to introduce the polynucleotide into a cancer cell in 
a patient, and allow It to interfere with the expression of the corresponding gene, such as by 
complexing with the gene itself or with the RNA transcribed from the gene. Entry into the cell is 
feicilitated tsy suitable techniques known in the art as providing the polynucleotide In the form of a 
suitable vector, or encapsulation of the polynucleotide in a liposome. The polynucleotide may be 

1 5 provided to the cancer site by an antigen-specific homing mechanism, or tyy direct injection. 

A prefen-ed mode of gene therapy is to provide the polynucleotide In such a way that it will 
replicate inside the cell, enhancing and prolonging the interference effect. Thus, the polynucleotide is 
operably linked to a suitable pronK>ter, such as the natural promoter of the corresponding gene, a 
heterologous promoter that is intrinsically active in cancer ceils, or a heterologous promoter that can 

20 be induced by a suitable agent Preferably, the construct is designed so that the polynucleotide 
sequem^e operably linked to the promoter is complementary to the sequence of the corresponding 
gene. Thus, once Integrated into the cellular genome, the transcript of the administered 
polynucteotkJewlll be complementary to the transcript of the gene, and capable of hybridizing with it 
This approach is known as anti-sense therapy. See, for example, Culver et al. and Roth. 

25 The use of antibodies emtxxiied in this invention in the treatment of cancer partly relies on the 

fact that genes that show RNA overabundance In cancer frequently encode cell-suriace proteins. 
Location of these proteins at the cell surface may correspond to an important biological function of the 
cancer cell, such as their interaction with other cells, the modulation of other cell-surface proteins, or 
triggering by an incoming cytokine. 

30 These mechanisms suggest a variety of ways in which a specific antibody may be effective in 

decreasing the pathotogy of a cancer cell. For example, if the gene encodes for a growth receptor, 
then an antibody that blocks the ligand binding site or causes endo^tosis of the receptor would 
decrease the abHity of the receptor to provMe its signal to the celt. It is unnecessary to have 
knowledge of the mechanism beforehand; the effectiveness of a particular antibody can be predicted 

35 empbicaily by testing with cultured cancer cells expressing the corresponding protein. Monoclonal 
antibodies may be more effective in this form of cancer therapy if several different clones directed at 
different detemninantsof the same cancer-assodategene product are used in comblnatton: see PCT 
application WO 94/00136 (Kasprzyk et al.). Such antibody treatment may directly decrease the 
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pathology of the cancer cells, or render them nnore susceptible to non-specific cytotoxic agents such 
as platinum (Lippman). 

Another example of how antibodies can be used In cancer therapy is in the specific targeting 
of effector components. The protein product of the cancer-associated gene is expected to appear in 
6 high frequency on cancer cells compared to unaffected cells, due to the overabundance of the 
con-esponding RNA. The protein therefore provides a marker for cancer cells that a specific antibody 
can bind to. An effector component attached to the antibody therefore becomes concentrated near 
the cancer cells, Improving the effect on those cells and decreasing the effect on non-cancer cells. 
This concentrattonvifould generally occur not only near the primary tumor, but also near cancer cells 

10 that have metastasized to other tissue sites. Furthennore. if the antibody is able to induce 
endocytosis. this will enhance entry of the effector Into the cell interior. 

For the purpose of targeting, an antibody specific for the protein of the cancer-associated 
gene is conjugated with a sulteble effector component, preferably by a covalent or high-affinity bond. 
Suitable effector components in such compositions include radionuclides such as "*l. toxic chemicals 

1 5 such as vincristine, and toxic peptides such as diphtheria toxin. Other suitable effector components 
include peptides or polynudeotidescapable of altering the phenotypeof the cell in a desirable fashion: 
for example, installing a tumor suppressergene, or rendering them susceptible to immune attack. 

In most applications of antibody molecules in human therapy. It is preferable to use human 
monoclonals, or antibodies that have been humanized by techniques known in the art. This helps 

20 preventthe antibody moteculesthemselvesfrom becoming a targetof the hosts Immune system. 

An exampte of how polypef^des embodied In this invention can be effectively used in 
tieatmentis through vacdnatton. The growthof cancer cells is naturally limited in partdue to immune 
surveillance. This refers to the recognltton of cancer cells by immune recognition units, particularly 
antibodies and T cells, and the consequent triggering of immune effector functions that limit tumor 

25 progression. Stimulation of the Immune system using a particular tumor-specific antigen enhances 
the effect towards the tumor expressing the antigen. Thus, an active vacdne comprising a 
polypeptide encoded by the cDNA of this invention would be appropriately administered to subjects 
having overabundance of the corresponding RNA. There may also be a prophylactic role for the 
vacdne in a population predisposed for developing cancer cells with overabundance of the same 

30 RNA. 

Ways of increasing the effectiveness of cancer vaccines are known in the art (Beardsley, 
Maclean et al,). For example, synthetic antigens are conjugated to a carrier like keyhofe limpet 
hemocyanin (KLH), and then combined with an adjuvant such as DETOX^. a mixture of 
mycobacterial cell walls and lipid A. Any pdypeptideenccxJed by the four novel genes described in 
35 this invention can be used In anatogous composKtons. 

Methods for preparing and administering polypeptkle vaccines are known In the art Peptides 
may be capable of elidting an Immune response on their own. or they may be rendered more 
immunogenic by chemical manipulation, such as cross-linking or attaching to a protein carrier like 
KLH. Preferably, the vacdne also comprises an adjuvant, such as alum, muramyl dipeptides, 
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liposomes, or DETOX^^. The vaccine may optionally comprise auxiliary substances such as wetting 
agents, emulsifying agents, and organic or inorganic salts or acids.. It also comprises a 
phamnaceutically acceptable exctpient which is compatible with the active Ingredient and appropriate 
for the route of administration. The desired dose for peptide vaccines is generally from 10 ^ to 1 mg. 
5 with a broad effective latitude. The vaccine is preferably administered first as a priming dose, and 
then again as a boosting dose, usually at least four weeks later. Further t)oostlng doses may be given 
to enhance the effect The dose and its timing are usually determined by the person responsible for 
the treatment 

1 0 Sequence tfata and deposits 

The foregoing detailed description provides, inter alia, a detailed explanation of how genes 
associated with cancer can be Identified and their cDNA obtained. Polynucleotide sequences for 
CH1-9a1 1-2, CH8-2a13-1. CH13-2a12*1. and CH14-2a16-1 are provided. 

15 The sequence data listed in this application was obtained by two-directional sequencing, 

except where indicated othenvise. The data are k)elieved to be accurate — nevertheless, it Is readily 
appreciated that the techniques of the art as used herein have the potential of Introducing occasional 
and infrequent sequence errors. Clones and inserts obtained via PGR may also comprise occasional 
errors introduced during amplification. Nucleotide sequences predicted from database compilations. 

20 and sequence data obtained by one-directtonal sequencing may also contain occasional errors in 
accordance with the limitations of the underlying techniques. In addition, alleiic variatkMis to both 
nucleotide and amino add sequences may occur naturally or be deliberately induced. Differences of 
any of these types between the sequences provided herein and the invention as practiced may bie 
present without departing firom the spirit of the invention. 

25 Sequence data for CH8-2a13-1 and CH1S'2a12-1 cDNA are believed to comprise the entire 

translated coding sequence, and 5' and 3* untranslated regions corresponding to those found in 
typical mRNA transcripts. Multiple mRNA transcripts may be found depending on the patterns of 
transcript processing in various cell types of Interest. Sequence data for CH1-9a11-2 and 
CH14-2a16-1 cDNA comprise a poriion of the coding sequence and 3' untranslated regions. 

30 Additional sequence is typically present in the corresponding mRNA transcripts, comprising an 
additional coding region in the N-terminal direction of the protein, and possibly a 5' untranslated 
region. 

Certain embodiments of this Invention may be practiced by polynucleotide synthesis 
according to the data provided herein, by rescuing an appropriate insert corresponding to the gene of 
35 interest from one of the deposits listed below, or by isolating a corresponding polynucleotide firom a 
suitat>ie tissue source. Vartous useful probes and primers for use in polynucleotide isolation are 
provided herein, or may be designed from the sequence data. 
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Three deposits have been made on May 31 , 1996 with the American Type CuKure Collection 
(ATCC), 12301 Parklawn Drive. Rockville. Maryland 20852 under tenns of the Budapest treaty. The 
deposits are outlined in Table 2: 



TABLE 2: ATCC Deposits 


BCGF1 

Accession No. 
98074 


Mixture of E. co// with recombinant plasmids of cDNA fragments of genes 
associated with t)reast cancer. The 8 recomblnantptasmids may be separated 
by plating on Ampiciltin plates and selecting single colonies for analysis by PGR 
using SP6 and T7 primers. 




Gene 


Subclone 


Expected size of PGR product 




CH1-9a11-2 


pch1-1.1 


1.1 kb 






pch1-2.5 


2.5 kb 




CH8-2a13-1 


pch8-600 


600 bp 1 






pch6-3k 


3.0 kb 1 






pch8-4k 


4.0 kb 1 




CH14.2a16-1 


pch1 4-800 


800 kb 






pch14-1.6 


1.6 kb 






pch14-1.3 


1.3 kb 


BCGF2 

Accession No. 
97595 


Mixture of Xgt1 0 recombinant phages with cDNA inserts of genes associated 
with breast cancer. The 2 phages may be separated by growing in the E, co// 
host (strain NM514) and plating out for single plaques. These plaques can be 
distinguished by PGR using Xgtl 0 reverse and fbnvanJ primers. 




Gene 


Phage 


Expected size of PGR product . 




CHI 3-2312-1 


Xch13-3.5 


3.5 kb 




CH14-2a16-1 


Xchi 4-2.5 


2.5 kb 


XBCBT474 

Accession No. 
97594 

I.. 


cDNA library derived fn3m breast cancer cell line BT474 in Xgtl 0 vector. 1 
supplemented with a cDNA library from breast cancer cell line 600PE in Xgtl 0 
vector. The cDNA insert sizes range from about 0.5 to 5 kb. 
XBCBT474 is a source of additional cDNA inserts con^spondingto 
CH1-9a1 1-2, CH8-2a13-1 . CH13-2a12-1 , or CH14-2a16-1 not present In 
BCGF-1orBCGF-2. | 



5 



Sequence databases contain sequences of polynucleotide and polypeptide fragments with 
varyous degrees of klentity and overlap with certain embodiments of this invention. The following list 
of accession numbers is provided for the interest of the reader; It is not intended to be comprehensive 
or a limitation on the invention. The database disclosures do not typically indicate use in cancer 
10 diagnosis, drug development or disease treatment. 

The following GenBank accession numbers are listed in relation to CH1-9a11-2: dbEST 
N32686; N45113; N36176; N22982: AA278830; H88670: AA235936: AA236951; H26301; N28026; 
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H88063; H88064; D61948; H88718; H26460; AA137g20; AA145308; W12g52; AA200687; N44164; 
T27279; dbSTS G22044; G04961 . 

The following GenBank accession numbers are listed in relation to CH8-2a13-1: dbNR 

D83780 

5 The following GenBank accession numbers are listed in relation to CH13-2a12-1: dbNR 

U58090; dbEST AA182441; AA253924; AA179755; AA112715; AA1 12640; VV67977; AA150317; 
W68080: AA150243; AA100446; W69636; H46574: AA245889; AA100651; H77368: AA192778; 
T85671; N32682: T86257: T78239; T77874; AA187865; Z33557; R40816; N99802; R19302; 
AA100650; N55904; AA257151; H77369; T79014. 
10 The following GenBank accession numbers are listed in relation to CH14-2a16-1: dbEST 

N64B02; V\66903; N31400; W95674; AA233S51; AA233636: N24105; W03447; W25821; AA233666; 
AA233647; N67843; D55778; T66839; N55370; N75650: AA280736; H97110; Z19643: H91250; 
AA230765; R93089; T84665: VV94857; R92873 

15 The examples presented below are provided as a further guide to a pracbtioner of ordinary 

skill in the art. and are not meant to be limiting in any way. 

Examples 

20 Example 1: SelectingeDNA for messenger RNA thatis overabundantin breast cancer ceiis 

Total RNA was isolated from each breast cancer ceil line or control cell by centrifugation 

through a gradient of guankline isothiocyanate/CsCi. The RNA was treated with RNase-firee DNase 

(Promega. Madison, Wl). After extraction with phenol-cfiloroform, the RNA preparations were stoiBd 
25 at '7(fC. Oligo^dT polynucleotides for priming at the 3' end of messenger RNA with the sequence 

TiiNM (where N € {A,C.G} and M e {A^C.G.T}) were synthesized according to standard protocols. 

Arbitrary decamer polynucleotkles(OPA01 to OPA20) for priming towards the 5' end were purchased 

from Operon Biotechnology. Inc., Alameda. CA. 

The RNA was reverse-transcribed using AMV reverse transcriptase (obtained from BRL) and 
30 an anchored oligo-dT primer in a volume of 20 |iL. according to the manufocturer's directions. The 

reaction was incubated at 370C for 60 min and stopped by incubating at 950C for 5 min. The cDNA 

obtained was used immediately or stored frozen at -70°C. 

Differential display was conducted according to the folfowing procedure: 1 cDNA was 

replicated in a total volume of 10 fiL PGR mixture containing the appropriate T^NM sequence. 0.5 TM 
35 of a decamer primer. 200 TM dNTP. 5 TCi [^S]-dATP (Amersham). Taq polymerase buffer with 2.5 

mM M9GI2 and 0.3 unit Taq polymerase (Promega). Forty cycles were conducted in the following 

sequence: 94^C for 30 sec. 4(fc for 2 min. T^C for 30 sec; and then the sample was Incubated at 
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72°C for 5 min. The repficated cONA was separated on a 6% polya^lamtde sequencing gel. After 
electrophoresis, the gel was dried and exposed to X-ray film. 

The autoradiogramwas analyzed for labeled cDNA that was present in larger relative amount 
in all of the lanes corresponding to breast cancer cells, compared with all of the lanes corresponding 
5 to control cells. Figure 1 provides an example of an autoradiogram from such an experiment. Lane 
1 is from non-proliferating normal breast cells; lane 2 is from proliferating normal breast cells; lanes 
3 to 5 are from breast cancer cell lines BT474. SKBR3, and MCF7. The left and right side shows 
the pattern obtained from experiments using the same T^^NM sequence (TtiAC). but two different 
decamer primers. The arrows indicate the cDNA fragments that were more abundant in all three 
1 0 tumor lines compared with controls. 

The assay illustrated in Figure 1 was conducted using different combinations of oiigo-dT 
primers and decamer primers. A number of differentially expressed bands were detected when 
different primer combinations were used. However, not all differences seen Initially were 
reproducible after re-screening. We therefore routinely repeated each differential display for each 
15 primer combination. Only bands showing RNA overabundance In at least 2 experiments were 
selected for further analysis. 

It is preferable to include in the differential display experiment RNA derived from uncultured 
normal mammary epithelial cells (termed "organoids"). These cells are obtained from surgical 
samples resected from healthy breast tissue, which are then coaxed apart by bliint dissection 
20 techniques and mild enzyme treatment Using organoids as the negative control, 33 cDNA 
fragments were isolated from 15 displays. 

ExamplB 2: Sub-selecting cDNA that corresponds to genes thai are duplicated In breast 
cancercells 

25 

cONA fragments that were differentially expressed in the fashion described in Example 1 
were excised from the dried gel and extracted by boiling at 950C for 10 min. Eluted cDNA was 
recovered by ethanol precipitation, and replicated by PCR. The product was cloned into the pCRII 
vector using the TA cloning system (Invitrogen). 

30 EcoRI digested placenta DNA, and EcoRI digested DNA from the breast cancer cell lines 

BT474, SKBR3 and ZR-75-30 were used to prepare Southem blots to screen the cloned cDNA 
fragments. The cloned cDNA fragments were labeled with [32PH1CTP, and used individually to probe 
the blots. A larger relative amount of binding of the probe to the lanes con-esponding to the cancer 
cell DNA indicated that the corresponding gene had been duplicated m the cancer cells. The labeled 

35 cDNA probes were also used in Northern blots to verify that the corresponding RNA was 
overatHindantin the appropriatecell lines. 

To detemiine whether the cDNA fragments obtained by this selection procedure 
corresponded to novel genes, a partial nucleotide sequence was obtained using Ml 3 primers. 
Each sequence was compared with the known sequences in GenBank. In initial experiments, 5 of 
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the first 7 genes sequenced were mitochondrial genes. To avoid repeated isolation of 
mitochondrial genes, subsequent screening experiments were done with additional lanes in the 
DNA blot analysis for EcoRI digested and H/ndlll digested mitochondrial DNA. Any cDNA fragment 
that hybridized to the appropriate mitochondrial restriction fragments was suspected of 
5 corresponding to a mitochondrial gene» and not analyzed further. 

From the 33 cDNA fragments detected from differential displays using organoid mRNA, 12 
were subcloned. Of these 12, 6 detected suitable gene duplications in the appropriate cell tines. 
Three cDNA failed to detect duplicated genes, and 3 appeared to correspond to mitochondrial 
genes. Sequence analysis of the 6 suitable cDNA fragments showed no Identity to any known 
10 genes. 

To obtain longer cDNA corresponding to the cDNA fragments with novel sequences, the 
firagments were used as probes to screen a cDNA library from breast cancer cell line BT474, 
constructed in lambda GT10. The longer cDNA obtained from lambda QT10 were sequenced 
using lambda GT10 primers. The chromosomal locations of the cDNAs were determined using 
1 5 panels of somatic cell hybrids. 

Four of the 6 novel cDNA identified so far have been processed in this fashion. The 
probes used to obtain the 4 new breast cancer genes are shown in Table 3. 



TABLE 3: Primers used for Differential Display 


cDNA 


Ollgo«dT primer 


Arbitrary primer 


CH1-9a11.2 


TiiCC 


(SEQIDNO: 9) 


SEQ ID NO: 11 


CH8-2a13-1 


TtiAC 


(SEQ IDNO:10) 


SEQ ID NO:12 


CH13-2a12-1 


TiiAC 


(SEQ ID NO:10) 


SEQ ID NO:13 


CH14.2a16-1 


TiiAC 


(SEQ ID NO:10) 


SEQIDNO:14 



20 

Examples: Using the cDNA to tBSt panels of breast cancer cells 

To determine the proportion of breast cancers in which the putative breast cancer genes 
were duplicated, or showed RNA overabundance without gene duplication, the four cDNA obtained 
25 according to the selection procedures descrit>ed were used to probe a panel of breast cancer ceil 
lines and primary tumors. 

Gene duplicabon was detected either by Southern analysis or sbt-blot analysis. For 
Southern analysis* 10 ^g of EccRl digested genomic DNA from different cell lines was 
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electrophoresedon 0.8% agarose and transferred to a HYBOND™ N+ membrane (Amersham). The 
filters were hybridized with 32P-labeied cDNA for the putative breast cancer gene. After an 
autoradiogram was obtained, the probe was stripped and the blot was re-probed using a reference 
probe to adjust for differences in sample loading. Either chromosome 2 probe D2S5 or chromosome 
5 21 probe D21S6 was used as a reference. Densities of the signals on the autoradiog rams were 
obtained using a densitometer (Molecular Dynamics). The density ratio between the breast cancer 
gene and the reference gene was calculated for each sample. Two samples of placental DN A digests 
were run in each Southern analysis as a control. 

For stot-btot analysis, 1 jjg of genomic DNA was denatured and slotted on the HYBOND^ 

10 membrane. D21S5 or human repetitive sequenceswere used as reference probesfor slot blots. The 
density ratio between the breast cancer gene and the reference gene was calculated for each sample. 
10-15 samples of placentad DNA digests were used as control. Amongst the control samples, the 
highest density ratio was set at 1.0. The density ratio of the tumor cell lines were standardized 
accordingly. An arbitrary cut-off for the standardized ratio (typically 1.3) was defined to identify 

15 samples in which the putative gene had been duplicated. Each of the cell lines in the breast cancer 
panel was scored positively or negatively for duplication of the gene being tested. 

Some of the cell lines in the panel were known to have duplicated chromosomal regions from 
comparative genomic hybridization analysis. In instances where the cDNA being used as probe 
mapped to the Known amplified region, the cDNA indicated that the corresponding gene had also 

20 been duplicated. However, duplicated genes were also detected using each of the four cDNAs in 
Instances where comparative genomic hybridization had not revealed any amplification. 

Because of the nature of the technique, the standardized ratio calculated as described 
underestimates the gene copy number, although it is expected to rank in the same order. For 
example, the standardized ratio obtained for the c-myc gene in the SKBR3 breast cancer cell was 5.0. 

25 However, it is known that SKBR3 has approximately 50 copies of the c-myc gene. 

To test for overabundanceof RNA, 10 ng of total RNA from breast cancer cell lines or primary 
breast cancer tumors were electrophoresed on 0.8% agarose in the presence of the denaturant 
fbrmamide, and then transferred to a nylon membrane. The membrane was probed first with 
32P-labeled cDN A corresponding to tire putative breast cancer gene, then stripped and reprobed with 

30 32P-labeled cDNA for the beta-actin gene to adjust for differences in sample loading. Ratios of 
densities t)etween the candidate gene and the t>eta-actin gene were cakuilated. RNA from three 
different cultured normal epithelial cells were included in the analysis as a control for the normal level 
of gene expression. The h^hest ratio obteined from the normal cell samples was set at 1 .0, and the 
rattos in the various tumor cells vi^re standardized accordingly. 
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ExamplB 4: Chromosome 1 genB CHI'-Sall-Z 

One of the cDNA obtained through the selection procedures of Examples 1 and 2 
5 corresponded to a gene that mapped to Chromosome 1 . 

Table 4 summarizes the results of the analysis for gene duplication and RNA overabundance. 
Both quantitative and qualitative assessment is shown. The numbers shown were obtained by 
comparing the autoradiograph intensity of the hybridizing band In each sample with that of the 
controls. Several control samples were used for the gene duplication experiments, consisting of 
10 different preparations of placental DNA. The control sample with the highest level of intensity was 
used for standardizing the other values. Other sources used for this analysis were breast cancer cell 
lines with the designations shown. For reasons stated In Example 3. the quantitative number is not a 
direct indication of the gene copy number, although It is expects to rank in the same order. Similarly, 
up to 6 control samples were used for the RNA overabundance experiments, consisting of different 
15 preparations of breast cell organoids which had been maintained briefly in tissue culture until the 
experiment was perfbnmed. The control sample with the highest level of intensity was used for 
standardizing the other values. Each cell line was scored + or - according to an arbitrary cut-off value. 
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TABLE 4: Chromosome 1 Gene Jn^ 1 
Breast Cancer Cell Lines | 


Source \ ^ 


CH1-8a11-2 

Gene 
Duplication 




CH1*3a11-2 








- 






5.2kb 


4.4kb 


1 


Normal 






1.00* 




1.00** 




4 n** H 

n 


BT474 


+ 




2.70 




1-57 




0.1 


ZR-75-30 


+ 




2.65 




nd 




no 


MDA453 






2.86 




S 7d 




0.2 








372 








2.4 




+ 




1.86 




0.94 






1 ftflOPP 


+ 




1.72 




4.47 


+ 


6.8 


MDA157 






1.49 




1.08 


+ 


1.4 


MCF7 






1.95 




nd 




nd 


DU4475 


+ 




2.02 




1.13 


+ 


1.5 


MDA231 






1.23 


+ 


1.47 






BT20 






1.09 




0.83 




1.9 


. T47D 






1.05 




nd 




nd 


UACC812 






0.67 


+ 


1.57 


+ 


1.6 j 


MDA134 






1.19 


+ 


5.04 


+ 


7.1 


CAMA-I 






1.02 


+ 


2.51 


+ 


7.2 


Incidence 
(%) 


9/15 
(60%) 


7/12 
(68%) 


11/12 
(92%) 



Gene duplication or RNA overabundance; • no duplication or overabundance: nd - not done 

E)egfee of gene duplication is reported retative to ptaoental DNA preparations. 
** Degree of RNA overabundance is reported relative to the highest level obseived for 

several cultures of nonnal epithefial cells. Two hybridizing species of RNA 

are catoulated and reported separslely. 



The gene connesponding to the CH1-9a1 1-2 cDNA was duplicated in 9 out of 15 (60%) of the 
breast cancer cell lines tested, compared with placental DNA d^ests (P3 and P12). The sequence of 
the 115 bases from the 5' end of the cDNA fragment (SEQ. ID NO:1) Is shown in Figure 22. There 
was no sut)stantial homology to any known gene In GenBank. One of the three possible reading 
5 frames was found to be open, with the predicted amino add shown in Figure 22 (SEQ. ID NO:2). 
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The CH1-9a11-2 gene was further characterized by obtaining additiondt sequence 
Infonnation. A X.-6T10 cDNA library from the breast cancer cell line BT474 (Example 2) was 
scremed using the initial cDNA Insert, and a done with a 2.5 kilobase insert was identified. The 
identified done was subdoned into plasmid vector pCRII. T7 and Sp6 primers for regions flanking the 
5 cDNA inserts were used as initial sequendng primers: 



T7 primer (SEQ. ID NO:42) 

5-.TAATACGACTCACTATAGGGAGA-3' 
Sp6 primer (SEQ. ID NO:43) 
10 5'-CATACGATTTAGGT6ACACTATAG-3' 



Sequencing continued by walking atong the region of interest by standard techniques, using 
sequendng primers based on data already obtained. Primers used in sequendng are designated 1- 
16 in Figure 7. 

15 A second done (designated pCH1-1.1) overlapping on the 5' end was detained using 

CLONTECH Marathon^ cDNA AmpBfkation Kit A map showing the overlapping regtons is provkled 
in Figure 6. Briefly, two DNA primers designated CHIa and CHIb (Figure 7) were synthesized. 
Polyadenylated RNA from breast cancer cell line 600PE was reverse transcribed using CHIb prirDer. 
After second strand synthesis, adaptor DNA provided in the kit was tigated to tlie double-stranded 

20 cDNA. The 6' end cDNA of CH1-9a11-2 was then amplified by PGR using primers CHIa and API 
(provided in the kit). To increase the spedficity of the PCR products, the first PGR products were 
PGR reampfified using nested primers CHIa and AP2 (provkled in the kit). The PGR products were 
doned Into pGRil vector (Invitrogen) and screened with CH1*9a1 1-2 probe. 

The sequence of 3452 base pairs between the 5' end of pGH1-1 . 1 and the poly-A tail of GH1 - 

25 9a1 1-2 was detennined by standard sequencing techniques. The DNA sequence \s shown in Figure 
8 (SEQ. ID N0:15). The tongest open reading frame is in frame 1 (bases 1-1875), and codes for 624 
amino adds before the stop codon. The corresponding amino acid sequence of this frame is shown 
in the upper panel of Figure 9 (SEQ. ID N0:16). The partial sequence predtoted for the translated 
pn>tein Is listed the low panel of Figure 9 (SEQ. ID NO: 17). Bases 1876 to the end of the sequence 

30 are believed to be a 3' untranslated regbn. A hydrophobicity analysis kientified a putative membrane 
insertion or membrane spanning region at about amino adds 382-400, indicated in Figure 9 by 
underiining. 

Figure 23 is a listing of additional cDNA sequence obtained for CH1-9d11-2. comprising 
approximately 1934 base padrs 5' firom the sequence of Figure 8. The additional sequence date was 
35 obteined by rescuing and amplifying two further fragmente of CH1-9a11-2 cDNA. Nested primers 
were designed -^100 base pairs downstream from the 5' end of the known sequence. The primers 
were used in a nested ampiificatton assay using API and AP2, using the CLONTECH Marathon^ 
cDNA Amplification Kit as described atx3ve. The template for the first upstream fragment was 
reverse-transcribed polyadenylated RtiA from breast cancer cell line 600PE . as described eartier. 
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This fragment was sequenced, and another set of nested primers was designed. The tmpiate for the 
next upstream fragment was a Marathon^ ready cDNA preparation from human testes, also supplied 
by CLONTECH, 

The nucleotide sequence shown in Figure 23 comprises an open reading frame through to 
5 the 5' end. Figure 24 shows the conBsponding protein translation. Between about another 500-1000 
bases are predicted to be present In the CH1-9a11-2 direction, with the protein encoding sequence 
beginning somewhere within this additional sequence. Sequencing of the encoding region is 
contpteted by obtaining additional CfHI -9a1 1 -2 fragments in this direction. 

A GENINFO® BLAST search of nucleotide and peptide sequence databases was performed 
10 through the National Center for Biotechnology Infonnation on February 23. 1996. Short segments of 
homology with other reported human sequences were found at the nucleotide level {<500 base pairs), 
but none with any ascribed function in the respective identifier At the amino add level, no identity 
higher than 30% was found with any reported euKaryotic sequences. 

A CH1-9a11-2 cloned insert has been used to probe the level of relative expression in 
15 polyadenylated RNA from a panel of tissue sources. The RNA was obtained already prepared for 
Northern blot analysis (CLONTECH Catalog # 7759-1. 7760-1 and 7756-1.) The manufacturer 
produced the blots from approximately 2 ng of poly-A RNA per lane, run on a denaturing 
fomnaldehyde 1-2% agarose gel. transferred to a nylon membrane, and fixed by UV irradiation. The 
relative CHI -9a1 1-2 expression obsen^ed at the RNA level is shown in Table 5: 

20 
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TABLES: Northern blot analysis 


Tlssus 

' ■ — — — - — • ' — 


CH1*^a11*2 mRNA 1 


1 heart 




brain 


+ 


placenta 


++ 


lung 


+/- 


liver 


+/- 


sl(etetal muscle 


+ 


kidney 




pancreas 


+++ 


spleen 


+ 


titymus 




prostate 


++ 


testis 


+++ 


ovary 


++ 


small Intestine 


+ 


colon 




peripheral blood 


+/- 


++++ Very high 
+++ High 
•++ Medium 
i low 
1 +/- Veiylow 



Relatively elevated levels of expression were observed in heart placenta, pancreas, prostate, testis 
and ovary. The level of expression in breast cancer cell lines is also relatively high (about on 
the scale), since the Northern analysis performed on these lines (descrit)ed above) was conducted on 

5 totel cellular RNA, of which polyadenylated RNA constitutes only about 5%. It is likely that the CHI- 
Sal 1-2 gene is involved in a btobglcal process that is typical to the tissue types showing niedium to 
high levels of expression, which may relate to Increased tissue growth or metabolism. 

Since the obtained sequerK:e is shorter than the apparent size of mRNA observed in 
Northern analysis (Table 1), an additional polynucleotide segment is believed to be present at the 5' 

10 end of the sequence shown in SEQ. ID NO: 15. Further sequence data at the 5' end is deduced by 
obtaining additional cloned cDNA using standard techniques. Briefly, in one approach, mRNA from 
breast cancer cell lines MDA-453 and/or 600PE are cloned and screened using primers based on 
sequence data from SEQ. ID NO: 15. Two nested primers of about 20 nudeotkles are prepared, the 
innermost about 150 base pairs from the 5' end, and the outermost about 170 base pairs from the 5' 

15 end. The outemnost primer is used to synthesize a first cDNA strand complementary to the vnRUA in 
the upstream dlrectton. Second strand synthesis is performed using reagents in a CLONTECH 
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Marathon™ cDNA amplification lot according to manubcturer's directions. The double-stranded DNA 
is then ligated at the 5' end of the coding sequence with the double-stranded adaptor fragment 
provided in the kit A first PCR amplificatjon (about 30 cycles) is performed using the first adapter 
primer from the Icit and the outermost RNA-specrfic primer, and a second amplification (about 30 
5 cycles) ^ performed using the second adapter primer and the innermost RNA-specific primer In an 
altemative approach, a CLONTECH RACE-READY single-stranded cDNA from human placenta is 
PCR amplified using nested 5' anchor primers in combination with the outermost and innemnost RNA- 
specific prin^rs. Amplified DNA obtained using either approach is analyzed by gel electrophoresis, 
and cloned into plasnrvd vector pCRII. Qones are screened, as necessary, using the 2.5 kilobase 
10 CH1-9a11-2 insert Ctones corresponding to full-length mRNA (4.5 kb or 5.5 kb; Table 1), or cDNA 
fragments overtapping at the 5' end are selected for sequencing. Compared with the 4.5 kb form, 
additional polynucleotide segments may be present in the 5.5 kb form within the encoding region, or in 
the 5' or 3' untranslated regk>n. 

15 Examples: a9roinosom0 8geneCH8'2a13-1 

One of the cDNA obtained corresponded to a gene that mapped to Chrcmosonr^ 8. Figure 2 
shows the Southern btot analysis for the corresponding gene in various DNA digests. Lane 1 (PI 2) is 
the control preparation of placental DNA; the rest show DNA obtained from human breast cancer cell 
20 lines. Panel A shows the pattern obtained using the 32P-labeled CH8-2a13-1 cDNA probe. Panel B 
shows the pattern obtained with the same bfot using the 32P4abeled D2S6 probe as a toading control. 
The sizes of the restrictton fragments are indicated on the right 

Figure 3 shows the Northern btot analysis for RNA overabundanoe. Lanes 1-3 show the level 
of expressbn in cultured normal epithelial cells. Lanes 4-19 show the level of expression in human 
25 breast cancer cell lines. Panel A shows the pattem obtained using the CH8-2a13-1 probe; panel B 
shows the pattem obtained with beta-actin cDNA. a bading control 

The results are summarized in Table 6. The scoring method is the same as for Example 4. 
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TABLE 6: Chromosoim 8 Genes 
In Breast Cancer Cell Lines 


Source 


••:;:?:i;CH8-2^ ; 
Gene Duplication 


CH8-2a13.1 
RNA Overabundance 


c-myc 
Gene Duplication 


Nonnal 




- 


1.00* 


- 


1.00** 


1.00* 


SKBR3 




+ 


4.25 


+ 


4.30 


+ 4.73 


ZR-75-30 






3.82 


nd 




+ 2.24 


BT474 




+ 


1.53 




1.72 


+ 1.76 


MDA157 




+ 


2.02 


+ 


3.39 


+ 1.39 


MCF7 




+ 


1.84 




4.92 


+ 3.10 


CAMA-1 




+ 


3.62 


+ 


2.14 


+ 1.61 


MDA361 




+ 


2.00 


+ 


1.74 


nd 


MDA468 




nd 






4.50 


nd 


T47D 




+ 


1.41 


+ 


1.58 


1.02 


MDA453 




+ 


1.83 


+ 


3.10 


A on 
U.UU 


MDA134 




+ 


1.30 




3.70 


0.88 


MDA435 






2,15 


+ 


4.94 


1.00 


600PE 






0.95 




2.04 


0.64 


UACC812 




+ 


1.25 




2.40 


0.74 


MDA231 






0.80 




1.28 


+ 1.27 


DU4475 






0.85 




0.88 


0.50 


BT468 






0.37 




0.70 


0.23 


BT20 






0.95 




0.82 




Incidence 
(%) 


(7116) 


(82H) 


^1 



+ Gene dupfication or RNA overabundanoe; • no duplication or overabundance; nd » not done. 

* Degree of gene duplication is reported relative to plaoental DNA preparations. 

Degree of RNA overatxindance is reported relative to the highest level obseived for several cultures of 
5 noimal epithelial cans. 

The gene corresponding to CIH8-2a13-1 showed clear evidence of duplication in 12 out of 17 
(71%) of the cells tested. RNA overabundance was obseived in 14 out of 17 (82%). Thus. 11% of 
the cells had achieved RNA overabundance by a mechanism other than gene duplication. 
10 Since the known oncogene c-myc is located on Chromosome 8, the Southern analysis was 

also conducted using a probe for c-myc. At least 2 of the breast cancer cells showing duplication of 
the gene corresponding to CH8-2a13-1 gene did not show duplication of cnnyc. This indicates that 
the gene corresponding to CH&-2a13-1 is not part of the myc amplicon. 

The sequence of ISO bases from the 5' end of the cDNA fragment is shown in Figure 22 
15 (SEQ ID NO:3). There was no substantial honx)logy to any known gene in GenBanIc One of the 
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three possible reading frames was found to be open, with the amino acid sequence shown in Figure 
22(SEQ IDNO:4). 

The CH8-2a13-1 gene was further characterized by obtaining additional sequence 
infonnation. A X-GT10 cDNA library from the breast cancer cell line BT474 (Example 2) was 
5 screened using the initial cDNA insert and clones with a 3.0 kb and a 4.0 kb insert were Wentified. 
The two identified clones were subcloned into plasmid vector pCRII. T7 and Sp6 primers for regions 
flanking the cDNA inserts were used as initial sequencing primers. Sequencing continued by walking 
along the region of interest by standard technwjues, using sequencing primers based on data already 
obtained. The two inserts were found to overtap (Figure 6). Primers used are those designated 1-25 
10 in Figure 10. 

A third done of about 600 bp (designated pCH8-600) overlapping on the 5' end (Figure 6) 
was obtained using CLONTECH Marathon^ cDNA Ampllficatfon Kit Briefly, two DNA primers CHBa 
and CHSb (Figure 10) were synthesized. Polyadenylated RNA from breast cancer cell line BT474 
was reverse transcribed using CH8b primer. After second strand synthesis, adaptor DNA provided in 
15 thekitwasligatedtothedouble-strandedcDNA TheS'endcDNAof CH8-2a13-1 was then amplified 
by PCR using primers CHBa and API (provided in the kit). To increase the spedfrcity of the PCR 
products, the first PCR products were PCR reamplified using nested printers CHBa and AP2 
(provided in the kit). The PCR products were ctoned into pCRII vector (Invitrogen) and screened with 
CHB-2a13-1 probe. 

20 By sequendng relevant portions of the three dones, a nudeic ackl sequence of 3982 base 

pairs between the 5* end and the poly^ tail of CH8-2a13-1 was detemnined. The DNA sequence is 
shown In Figure 1 1 (SEQ. ID NO:18). Bases 1*152 are believed to be a 5' untranslated region. The 
k>ngest open reading frame is in firame 3 from base 153 to 3911. and codes for 1252 amino adds 
before the stop codon. The conesponding amino add sequence of this firame is shown in the upper 

25 panel of Figure 12 (SEQ. ID NO:19). The sequence predkrted for the translated protein Is shown in 
ttie bwer panel of F^ure 12(SEQ. ID NO:20). 

A GENINFO® BLAST search of nudeotide and peptide sequence databases was performed 
through the National Center for Btotechnotogy Infbmnation on March 26, 1996. The sequences were 
found to be about 99% identical at the nudeotide and amino add level with bases 343-4103 of 

30 KIAA0196 protein (N. Nonujra et al., in press; sequence submitted to the DDBJ/EMBL/GenBank 
databases on March 4. 1996). The KIAA0196 was one of 200 different cDNA doned at random from 
an immature male human myelot)last cell line. KIAA0196 has no known btotogteal function, and is 
described by Nomura et al. as being tdxqultousiy expressed. 

A fourth done of about 600 bp overlapping pCHS^OO at the 5' end has also been obtained. 

35 Briefly, a DNA primer was synthe^zed corresponding to about the first 20 nucleotkles at the 5' of the 
precScted cDNA sequence, and used along with a primer based on the pCH8-600 sequence to 
reverse-transcribe RNA from breast cancer cell line BT474. The product was doned Into pCRII vector 
(Invitrogen) and screened with a CH8-2a13-1 probe. The new done is sequenced atong both strands 
to obtain additional 5' untranslated sequence data for the cDNA The predkrted compiled cDNA 
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nucleotide sequence of CH8-2a13-1 cDNA is shown in Figure 13 (8EQ. ID NO:21). The 
corresponding amino add sequence of this frame is shown in Figure 14 (SEQ. ID NO:22). A 
polynucleotide comprising the compiled sequence is assemt>led by joining the insert of this fourth 
clone to pCH8-4k within the shared regbn. Briefly, CH8-4k is cut with Xba\ and Albfl. The fourth 
5 done is cut with SamHI and Xbal The Bgated polynucleotide is then inserted into pCRII cut with 
Ba/nHI and Not. 

A CH8-2a13-1 cloned insert has been used to probe the level of relative expression in 
polyadenylated RNA from a panel of tissue sources obtained from CLONTECIH. as in Example 4. 
The relative CIH8-2a13-12 expression observed at the mRNA level is shown in Table 7: 



TABLET: Noflhemfolotaniilyeis 






heart 




k>rain 


1 


placenta 




lung 


+ 


liver 


+/- 


skeletal muscle 


+/- 


kidney 


+f- 


pancreas 




1 spleen 


+ 


thymus 


+ 


prostate 


+ 


testis 




ovary 


+ 


small intestine 




colon 


+ 


peripheral blood 


+/- 




Very high 




High 


1 . 


Medium 


1 + 


IjOW 


1 


Very tow 



Relative levels of expression observed were as follows: Low levels of expression were observed in 
adult peripheral blood leukocytes (PEL), brain, placenta, lung, liver, skeletal muscle, kidney, and 
pancreas. Medium levels of expression were observed in adult heart, spleen, thymus, prostate, testis, 
15 ovary, small intestine, and cobn. High levels of expressk)n were obsenred in four fetal tissues tested: 
brain, lung, liver and kidney. The level of expression in breast cancer cell lines is relatively high 
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(about *M-H> on the scale), since the Northern analysis perfonmed on these lines was conducted on 
tofa/ cellular RNA. It is likely that the CH8-2a13-1 gene is involved In a biological process that is 
typical to the tissue types lowing medium to high levels of expression, which may relate to increased 
tissue growth or metabolism. 

5 

Example 6: Chromosome 13 gene CHIS-Zall-I 

One of the cDNA obtained corresponded to a gene that mapped to Chromosome 13. Figure 
4 shows the Southem blot analysis for the correspondir^ gene in various DNA digests. Lanes 1 and 
10 2 are control preparations of placental DNA; the rest show DNA obtained from human breast cancer 
cell lines. Panel A shows the pattern obtained using the CH13-2a12-1 cDNA probe; panel B shows 
the pattern using D2S6 probe as a loading control. The sizes of the restriction fragments are 
indicated on the right 

Figure 5 shows the Northern blot analysis for RNA overabundance of the CH13-2a12-1 gene. 

1 5 Lanes 1-3 show the level of expression in cultured normal eptthefiai ceils. Lanes 4-1 9 show the level 
of expression In human breast cancer cell lines. Panel A shows the pattem obtained using the 
CH13-2a12-1 probe; panel B shows the pattem obtained with beta^actin cDNA, a loading control. The 
apparent size of the mRNA varied depending upon conditions of electrophoresis. Full4ength mRNA is 
believed to occur at sizes of about 3.2 and 3.5 kb. 

20 The results of the RNA abundance comparison are summarized in Table 8. The scoring 

method is the same as for Example 4. 
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TABLE 8: Chiomosomo 13 Gene 
in Breast Cancer Ceil Unas 


Source 


CH13*2a12«1 
Gene duplication 


CH13-2a12-1 
RNA^Qverabundance : 


Normal 


- 


1.00* 


- 


1.00** 


600PE 




2.18 




5 57 


BT474 


+ 


1.60 


+ 


3.20 


SKBR3 




158 


+ 


4^5 


MDA157 


+ 


2.21 


+ 


3.76 


CAWA-1 


+ 


1.41 


+ 


1.99 


1 MDA231 


+ 


1.65 


+ 


2.09 


T47D 


+ 


1.23 


+ 


1.20 


MDM68 


nd 




+ 


6.90 


MDA361 


nd 




+ 


2.59 


MDA435 




0.59 


+ 


3.41 


MDA134 




0.53 


+ 


2.59 


DU4475 




0.75 




1.79 


MDA453 




0.89 


+ 


1.97 


BT20 




0.37 




1.04 


MCF7 




0.29 




1.03 


UACC812 




0.30 




0.39 


BT468 




0.47 


nd 




ZR-75-30 




0.70 


nd 




Incidence 
(%) 


7M6 
(44%) 


13/16 
(81%) 



•f Gene duplication or RNA overabundance; - no duplication or overabundance; nd » not done 
* Degree of gene dt4>lication is reported lelativa to placental DMA preparations. 

Degree of RNA overabundance is mpoitad relative to the hig^t level observed for several cultures 
5 of noimal epithelial celts. 

The gene corresponding to CH13-2a12-1 was duplicated In 7 out of 16 (44%) of the cells 
tested. Three of the positive cell lines (600PE, BT474, and MDA435) had been studied previously by 
comparative genomic hybridization, but had not shown amplified chromatin in the region where CH13- 
10 2A12-1 has been mapped in these studies. 

RNA overabundance was observed in 13 out of 16 (81%) of the cell tines tested. Thus, 37% 
of the celts had achieved RNA overabundance by a mechanism other than gene duplication. 
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Cells from primary breast tumors have also been analyze ttem for duplication of the 
chromosome 13 gene. Ten of the 82 tumors analyzed (12%) were positive, confimiing that 
duplication of this gene is not an artifact of in vitro culture. 

The sequence of 107 bases from the 5* end of the 1.5 kb cDNA fragment is shown in Figure 
5 22 (SEQ ID N0:5). There was no substantial homology to any known gene in GenBank. One of the 
three possible reading frames was found to be open, with the predicted amino ackJ sequence shown 
in Figure 22 (SEQ ID NO:6). 

The CH13-2a12-1 gene was further characterized by obtaining addttk)nal sequence 
infbnmation. A X-GT10 cDNA library from the breast cancer cell line BT474 (Example 2) was 

10 screened using the initial cDNA Insert and clones with a 3.5 kilobase and a 1.6 kilobase insert were 
identified. The two kientified ctones were subckuied into plasmki vector pCRII. T7 and Sp6 primers 
for regions franking the cDNA inserts were used as initial sequencing primers. Sequencing continued 
by walking along the region of interest by standard techniques^ using sequencing primers based on 
data already obtained. The two inserts were (bund to overlap (Figure 6). Primers used during 

1 5 sequencing are shown in Figure IS. 

By sequencing relevant portions of the 3.5 and 1.6 kb clones, a nucleic acid sequence of 
3339 base pairs between the 5' end and the pdy-A tail of CH13-2a12-1 was determined. The DNA 
sequence is shown in Figure 16 (SEQ. ID NO:23). Bases 1-520 are believed to be a 5* untranslated 
regk)n. The tongest open reading frame is in frame 2 from base 521 to 1838, and codes for 611 

20 amino acids before the stop codon. The corresponding amino ackl sequence of this frame is shown 
in the upper panel of Figure 17 (SEQ. ID NO:24). The sequence predicted for the translated protein is 
shown in the tower panel of Figure 17 (SEQ. ID NO:25). Bases 1838 to 3339 of the nucfeotkie 
sequence are believed to be a 3' untranslated region, whk:h is present in the 3.5 kb insert The 3.5 kb 
insert appears to be a spRoe variant (Figure 6), in whtoh the 3' untranslated region consists of bases 

25 1838-2797 in the sequence. 

A GENlNFO<& BLAST search of nucleotkie and peptkle sequence databases was performed 
through the National Center for Biotechnology Infomiation on March 26, 1996. Short segments of 
honfKJtogy with other reported human sequences were found at the nucleotkie level (<500 base pairs), 
but none with any ascribed functk>n in the respective kJentifier. At the amino acid level, the sequence 

30 was found to share 33% identities and 54% positives with 228 residues of the iin 19 protein of 
Caenoffiabditis elegans. This protein has been implicated in regulating the celi cycle of C. eiegans 
(ET Kiprecs, W He & EM Hedgecock). The CH13-2a12-1 gene is suspected of a role in controlling 
cell proliferatton. 'Controlling cell proliferatton" in this context means that an abnormally h^h or tow 
level of gene expresston at the RNA or protein level results in a higher or lower rate of ceo 

35 pTDlilieration. or vfoe versa, compared with ceQs with an othemwse similar phenotype. There is also a 
tow-level homotogy between CH13-2a12-1 and VACM-1, a vasopressin-adivated, caldunwnobilizing 
receptor firom rabbit kMney medulla (Bumatowska-Hledin et al). VACM-1 has a transmembrane 
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sequence, whereas none has been detected in CH13-2a12-l Nevertheless, it is possit>le that the 
CH13-2a12-1 protein product has a Ca^ t>inding or Ca^ mobilizing function. 

A CH13-2a12-1 cloned insert has been used to probe the level of relative expression in 
polyadenylated RNA from a panel of tissue sources obtained from CLONTECH, as in Example 4. 
5 The relative CH1 3-2a12-1 expression observed at the mRNA level is shown In Table 9: 



TABLES: Nofthem blot analysis 


Tissus 




heart 




brain 


+ 


placenta 




lung 


+ 


liver 


•4-1" 


skeletal muscle 


++++ 


kidney 


+ 


pancreas 


++ 


spleen 


++ 


thymus 


++ 


prostate 


++ 


testis 


+++ 


ovary 




small Intestine 


++ 


colon 


+ 


peripheral blood 


+ 


++++ Very high 
+++ High 
++ Medlwn 

Very kw 



Relatively elevated levels of expression were observed in heart, skeletal rtHi^e and testis. 
The level of expression in breast cancer cell lines is relatively high (about -^-^ on the scale), since 
10 the Northern analysis performed on these tines was conducted on total cellular RNA. It is likely that 
the CH13-2a12-1 gene is involved in a bk)logk:al process that is typical to the tissue types showing 
medium to high levels of expresston. which may relate to increased tissue growth or metabolism. 

Fragments conresponding to the CH13-2a12>1 gene have also been used to screen cell lines 
derived from ottier types of cancer. Southern analysis showed that about 1 out of 4 breast cancer cell 
1 5 lines tested have gene duplication of CH13-2a12-1 . Northem analysis showed that about 3 out of 6 
lines tested have overexpresskxi of the corresponding RNA transcript. 
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ExampteT: Chromosome 14 gene CHI^^ZalB-l 

One of the cDNA obtained corresponded to a gene that mapped to Chromosome 14. ResuRs 
5 of the anaiysis are summarized in Table 1 0. The scxiring method is the same as for Example 4. 



1 TABLE 10: Chromosome 14 Gene 

In Breast Cancer Cell Lines 




CH14^2a16-1 


CH14-2a18.1 






Gene duplication 


RNA Overabundance 




NonmaJ 




1.00* 




1.00** 




BT474 


+ 


2.89 




2.57 




MGR 


+ 


1.35 


+ 


1.88 




SKBR3 


+ 


2.58 


+ 


2.19 




T47D 




2.28 


nd 






MDA157 


+ 


1.52 


+ 


2.52 




UACC812 


+ 


2.23 


nd 






MDA361 




0.97 


+ 


1.43 




MDA453 


+ 


1.58 


+ 


5.92 




BT20 








1.07 




600PE 




0.94 


+ 


2.00 




MDA231 


+ 


1.66 


+ 


2.19 




CAMA-1 




0.92 




0.71 




DU4475 




0.87 


+ 


1.33 




BT468 




0.46 


nd 






MDA134 




0,77 


+ 


7.17 




Incidence 


8f16 
(S3%) 


10/12 
(83%) 



1- Gene duplication or overabundarKe;- no duplication or overa 
* Degree of gene duplication is repofted relative to plaoentalDr^ 

** Degree of RNA overabundance is reported relative to the highest level observed for several cultures 
of nonmal epitMial ceSs. 



The gene correspondmg to CH14-2a16-1 was duplicated In 8 out of 15 (53%) of the cells 
tested. The sequence of 114 bases from the 5' end of the cDMA fragment is shown in Figure 22 
16 (SEQ ID NO:7). There was no substantial homology to any Icnown gene In GenBank. One of the 
three possible reading frames was found to be opm^ with the predicted amino acid sequence shown 
in Figure 22 (SEQ ID NO:8). 
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The CH14-2a16-1 gene was further characterized by obtaining additional sequence 
infomnation. A X-GT10 cDNA library fix>m the breast cancer cell line BT474 (Example 2) was 
screened using the initial cDNA Insert, and two clones were identified: one with a 1 .6 kb insert, and 
the other with a 2.5 kb insert The klentified clones were subcloned into plasmkl vector pCRII. The 

5 1.6 kb insert was sequenced by using T7 and Sp6 primers for regions flanking the cDNA inserts as 
initial sequencing primers. Sequencing continued by walking along the region of interest by standard 
techniques, using sequencing primers based on data already obtained. Primers used are those 
des^nated 1-1 1 in Figure 18. 

A third done (designated pCH14-800) overlapping on the 5' end (Figure 6) was obtained 

10 using CLONTECH Marathon^ cDNA Amplifrcation Kit Briefly, DNA primers CH14a. CH14b. CH14c 
and CH14d (Figure 18) were prepared. Pdyadenylated RNA from breast cancer cell line MDA453 
was reverse transcribed using 14b primer. After second strand synthesis, adaptor DNA provided in 
the kit was ligated to the doubie-stranded cDNA. The 5' end cDNA of CH14-2a16-1 was then 
ampOfied by PGR using primers CH14b (or CH14c) and API (provkied in the kit). To increase the 

15 spedftoify of the PCR products, the first PGR products were PGR reamplified using nested primers 
GH14a (or GH14d) and AP2 (provkied In the kit). The PGR products were cloned into pCRII vector 
(Invitrogen) and screened with GH14*2a16-1 probe. 

By sequencing pCH14-1.6 and pCH14-8CX), a nucleic acid sequence of 2021 base pairs 
between the 5' end and the poly-A tail of CH14-2a16-1 has been detemnined. The DNA sequence is 

20 shown in Figure 19 (SEQ. ID NO:26). The longest open reading frame is in frame 1 from base 1 to 
792, and codes for 263 amino acids before the stop codon. The corresponding amino acid sequence 
of this frame is shown in the upper panel of Figure 20 (SEQ. ID NO:27). The partial sequence 
predicted for the translated protein is shown in the fower panel of Figure 20 (SEQ. ID NO:28). The 2.1 
kb clone has not been sequenced, but is believed to consist about the same regton of the 

25 GH14-2a1 6-1 cDNA as pGH14-1 .6 and pGH14-800 combined. 

A GENINFO® BLAST search of nudeotkJe and peptkle sequence databases was performed 
through the National Genter for Bfotechnology information on March 26, 1996. Short segments of 
homology with other reported human sequences were found at the nucleotxie level (<500 base pairs), 
but none vt^h any ascribed fonction in the respective identifier. At the amino add level, the sequence 

30 was found to share homologies within the first 106 residues with an RNA binding protein fn>m 
Saccharomyces cerevisiae with the designation fslAB2. NAB2 is one of the major proteins associated 
with nudear polyadenylated RNA In yeast cells, as detected by UV lightHnduoed cross-linking and 
immunofluorescence. NAB2 is strongly and specifically associated with nuclear pbiy(A)+ RNA in vivo. 
Gene knock-out experiments have shown that this protein is essential to yeast cell survival 

35 (Anderson et al.). Accordingly, the protein encoded by GH14-2a16-1 is suspected of having DNA or 
RNA binding acHvity. 

A fourth done (pGH14-1.3) has been obtained that overiaps the pGH14-800 clone at the 5' 
end (FlguiB 6). The method of isolation was sinrnlar to that for pCM14-800, using primers based on 
the pCH14-800 sequence. Partial sequence data for pCH14-1.3 has been obtained by one- 
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direcHonal sequencing from the 5' and 3' ends of the pCH14-1.3 done. Figure 21 shows the 
nucleotide sequence of the sequence of the 5' end (SEQ. ID NO:29) and the amino add translation of 
the likely open reading frame (SEQ. ID NO:30); the nudeotide sequence of the 3' end (SEQ. ID 
NO:31) and the likely open reading frame (SEQ. ID NO:32). This data is confirmed and additional 
6 sequence t>etween SEQ. ID NOS.29 and 31 is obtained l>y fully sequendng both strands of pCH14- 
1.3. Once compiled, the sequence data from pCH14-1.3, pCH14-800 and pCH14-1.6 may be shorter 
than the apparent size of mRNA ol)served in Northern analysis (Table 1). If necessary, further 
sequence data at the 5' end is deduced by obtaining additk)nal cloned cDNA according to approaches 
described in this Example or Example 4. 

10 Figure 25 is a listing of additional cDNA sequence obtained for CH14-2a16-1. comprising 

approximately 1934 base pairs 5' from the sequence of Figure 19. The oomesponding amino add 
translation is shown in the upper panel of Figure 26. The additkMial sequence data was obtained by 
rescuing and amplifying further fragments of CH14-2a16-1 cDNA. Nested primers were designed 
-100 base pairs downstream from the 5' end of the known sequence. The primers were used in a 

15 nested amplificatfon assay using API and AP2, using the CLONTECH Marathon^ cDNA 
Amplification Kit as described above. The template was a Marathon^ ready cDNA preparatton from 
human testes, also supplied by CLONTECH. 

The nucleotide sequence shown in Figure 25 is dosed at the the 5' end. The lower panel of 
Figure 26 shows what is predicted to be the sequence of the gene product beginning at the first 

20 methwnine residue. The nucleotide sequence shown contains a point difference at the position 
indicated by the undertining in Figure 25. A base detennined to be A from the previously obtained 
polynucleotide fragment was a G in the one used in this part of the experiment This corresponds to a 
change firom E (glutamic add) to G (glydne) in the protein sequence, at the position underiined in 
Figure 26. This may represent a natural allelic variatk)n. 

2^ A CH14-2a16-1 doned insert has been used to probe the level of relative expresston in 

polyadenylated RNA from a panel of tissue sources obtained from CLONTECH, as in Example 4. 
The relative CH14-2a16-1 expression obsen^ed at the mRNA level is shown In Table 1 1 : 
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TABLE 11: Northern blot analysis | 


^TiSSUO. 




heart 


+ 


bra&fi 




placenta 


+ 


lung 


+ 


liver 




skeletal muscle 


+ 


kidney 




pancreas 


+ 


spleen 


+ 


thymus 


+ 


prostate 




testis 


++++ 


ovary 


+ 


small Intestine 


+ 


colon 


+ 


peripheral blood 


+/ 


++4-f- Very high 
+++ High 
++ Medium 
Low 

+/- Veiyiow 



CH14-2a16-1 mRNA was particularly high in testis. The level of expression in breast cancer 
cell lines is also quite high, since the Northem analysis performed on these Dnes was conducted on 
5 totai cellular RNA. It is likely that the CH14-2a16-1 gene is involved in a btological process that is 
typical to the tissue types showing medium to high levels of expressbn, vifhich may relate to increased 
tissue growth or metabolism. 

Five motifs corresponding to a zinc finger protein have been found In the CH14-2a16-1 
nucleotide sequence. Further zinc finger motifs may be present in CH14-2a16-1 in the upstream 
10 direction. 23nc finger motifs are present, for example, in RNA polymerases I, II, and III from S. 
cersv^ae, and are related to the zinc knuckle femily of RNA/ssDNA-binding proteins found in the HIV 
nucleocapsid protein. The actual sequence observed in each of the five zinc finger motifs of 
CH14-2a16-1 is: 

15 Cys -^Xaa^<--C^Xaa)^-^;y&>{Xaa)^-HiS or (SEQ. ID NO:38) 

Cys-<Xaa)s-£xs-{Xaa)s-Cxa^ (SEQ. ID NO:39) 
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which is indicated in Figure 20 by underlining. This is identical to the 7 zinc finger motifs of NAB2, 
which make up an RNA/ssDNA binding region (Anderson et al.). Accordingly, the CH14-2a16-1 gene 
5 product is suspected of having DNA or RNA binding activity, and may be specific for polyadenylated 
RNA It may very well play a role in the regulation of gene replication, transcription, the processing of 
hnRNA into mature mRNA the export of mRNA from the nucleus to the cytoplasm, or translation into 
protein. This role in turn m^y be closely implicated in cell growth or proliferation, particularly as 
manifest in tumor cells. 

10 

Example 8: idenOttcattoncf other cane0r^ass€}ciaiBdg0nBS 

cDNA fragments corresponding to additional cancer-associated genes are obtained by 
applying the techniques of Examples 1 & 2 with appropriate adaptations. As before, cancer cells 
15 are selected for use in differential display of RNA. based on whether they share a duplicated 
chromosomal region according to Table 12: 





TABLE 12: Cancer cell lines sharing dup1lc«iled chromosomal^^ 


Chromosomai 
location 


Catiper^pe & raferences 




1p22-32 


smaHcell (Levin 1994) 




1p22 


bladder (KaUioniem1 1995) 




1p32-33 


rabdomyosarcoma (Steilen-Glmbel); breasi (Ried 1995); | 
small cell lung (Ried 1 994) | 




1q21-22 


sarcoma (Forus 1995a &b); breast (Muleris 1994a) 




1q24 


small cell (Levin 1994) 




1q31 


bladder (Kallioniem1 1995) 




1q32 


glioma (Muleris 1994b; Schrodc) 




1q 


head and neck (Speicher 1995)^ breast (Muleris 1994a) 




2p23 


small cell lung (Ried 1994) 




2(^4-25 


small cell Kmg (Levin 1994) 




2 


head and neck (Speicher 1995) 




2q 


head and neck (Speicher 1995) 




2q33-36 


head and neck (Speicher 1995) 




3p22*24 


bladder (Voorter). small cell (Levin 1994) 




3q24-26 


bladder (KalBoniemi 1995), glioma (Kim), osteosarcoma (Tarkkanen) 




3q25-26 


ovarian (Iwabuchi) 
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TABLE 12: Cancer cell lines sharing duplicated chromosonuil regions | 


Chromosomal 
location 


- v :::Juancer lypo a rviDi vncn . : • j 


3q2o-tenn 


neao arKi necK ^opQicner 1099/ i 


3q 


«m»ii cefl luno f Levin 1995' Rerid 1994): head and neck (Spetcher 1995) R 






5p 


email f«oll liinn /I A\iin 1994 A 1995* Rted 1994) V 

small coll lUFiy ^LOVIII 1 w i99-w^ I 


1 5p16.1 


Qiioma ^iviuiens 199^0) II 


1 


osteossrcorna (ronjs isaoa;, Dreosi 1999; 


y 6p21*term 


melanoma (Speicher) 


7p 


glioma (SChliegel 1994 & 1996: may bB EGFR) 


7p11-12 


glioma (Mulerts 1994b; Schrock). smaH cdi lung (Ried 1994) 


7q21-32 


glioma (K^; Muleris 1994b; Schrock) 


7q21-22. 


head and neck (Speicher). glioma (Schrock) 


7q33-tmn 


head and neck (Speicher 1995) 


7 


colon (Schlegel 1995); glioma (Kim), head and neck (Speicher); 
prostate (VisakorpQ 


8q 


smsdl cell lung (Ried 1994) 


1 8q21 


bladder (Kallloniemi 1995) 


1 OnO^ 


mvAltf^d ifiukfimia ^Mohamad) 


I 8a22-24 


glioma (Kim; Muleris 1994b); breast (Muleris 1994a) 


1 8q24-25 


small cell (Levin 1994; Ried 1994); breast (Muleris 1994a) 


8q23-temi 


sarcoma (Fonis 1995a), melanoma (Speicher) 




ovarian (Iwabuchi) 1 




breast (Ried 1995; isoia: Muleris 1994a), small cell lung (Levin 1994 & 1 995). B- | 
cell leukemias (Bentz 1994a), myeloici ieiAemia (Bentz 1 994b). glioma (Schlegel). 1 

^A'%M anti rteu^ IQnoif4ior 199'^) nrnfitatfi rCher X/lsakomi) 1 
neao anO nCCIv ^OpOIUlWI IVVv/t JJIWOMaro ^s^nd, viaaixwwpv n 




9 


head and neck (Spek:her) 1 


9P 




9p2 


glioma (Muleris 1994b) 


9p13 


brmst (Muleris 1994a) 


10p 


head and neck (Spek:her 1995) 


10p13.14 


bladder (Voorter) 


10q22 


breast (Muleris 1994a) | 


11q13 


head and neck (Speicher 1995), breast (Muleris 1994a) | 
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1 TABLE 1 2: Cancer ceil lines sharing duplicated chromosomal regions 


1 -^•-•Chromosomal 
location 


.•• ■-Gancer^pefli refe 


12 

12p 

12q 
12q12.15 
12q21.3-22 


B-cell leukemias (Bentz 1995a) 

head and neck (Speicher 1995), glioma (Schrock) 

glioma (Schlegel 1994) 

bladder (Voorter). osteosarcoma (Tarkkanen), iiposarcoma (Suijkerbuijk) 
Kposarcoma (Suijkerbuijk) 


13 
13q 
13q21-34 
13q32-term 


cok>n (Schtegel 1995) | 
breast (Ried 1 995), head and neck (Speicher 1 995) 1 
bladder (KalUonlemi 1995) 1 
head and neck (Spek:her 1995), small cell lung (Ried 1 994) | 


14q 


head and neck (Speicher 1995) | 


15q26 


breast (Muleris 1994a) | 


16 
16p 
16p11.2 


head and neck (Speicher 1995) 1 
breast (Ried 1995) 1 
bmast rMularis ig84a\ 


17 

1 17p11-12 
1 

1 17q21.1 
17q22-23 
17q22-24 


head and neck rSofiicher 1fi85\ 
osteosarcoma f Forus 1 995a* Tarkkanen) 
breast (Ried 1995), small cell lung (Ried 1994) 
breast (Muleris 1994a) 
t>ladder (Voorter), breast (Muleris 1994a) 
breast (KalllQfiiemi 1994) 


18p11 


bladder (Voorter) 


19q13.1 


small cell lung (Ried 1994) 


1 

1 20q 
1 20q13.3 


head and neck (Speicher 1995) 

ovarian (IwabuchI). colon (Schlegel 1995), breast (Isola; Tanner) 
breast (Muleris 1994a), Kallioniemi (1994) 


22q 
22q11-13 


head and neck (Speicher 1995) 
bladder (Voorter). glioma (Schrock) 


X 
Xq 
Xq24 
Xq11-13 


prostate (Visakorpi) 
smaH cell lung (Levin 1995) 
small ceil (Levin 1994) 

prostate (VisakorpO, osteosarcoma (Tarickanen) 



Control RNA is prepared ftom normal tissues to match that of the cancer cells In the 
experiment. Normal tissue is otitained from autopsy, btopsy, or surgk:at resection. Absence of 
neoplastk: cells in the control tissue is confirmed, if necessary, by standard histotoglcal technhjues. 
5 cONA corresponding to RNA that is overabundant in cancer cells and duplicated in a proportion of 
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the same cells is characterized further, as In Examples 3-7. Additional cDNA comprising an entire 
protein-product encoding region is rescued or selected according to standard molecular biology 
techniques as described elsewhere in this disclosure. 

5 
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Claims 

What is claimed as the invention is: 

5 1 . An isolated polynucleotide comprteing a linear sequence of at least 1 0 nucleotides identical to 
a linear sequence contained in a polynucleotide selected from the group consisting of CHS- 
2a13-1. CH13-2a12-1. CH14-2a16-1. and CH1-9a11-2. 

2. An isolated polynucleotide comprising a linear sequence of at least 40 consecutive 
10 nucleotides at least 90% identical to a finear sequence contained in a sequence selected 

from the group consisting of SEQ. ID NO:15. SEQ. ID NO:18, SEQ. ID NO:21. SEQ. ID 
NO:23, SEQ. ID NO:26, SEQ. ID NO:29. SEQ. ID NO:31.. SEQ. ID NO:33. and SEQ. ID 
NO:35; tnjt not in any of SEQ. ID NOS: 1 , 3. 5. and 7. 

15 3. The isolated polynucleotide of daim 2. comprising a linear sequence of at least 100 
consecutive nucleotides at least 90% identical to a sequence contained in the selected 
sequence. 

4. The isolated polynucleotide of claim 2, comprising a linear sequence of at least 40 
20 consecutive nudeotiites at least 95% identical to a sequence contained in the selected 

sequerK^. 

5. An isolated polynucleotide comprising a linear sequence of at least 40 consecutive 
nucleotides that hybridizes with a DNA having a sequence selected from the group consisting 

26 of SEQ. ID Nai5. SEa ID NO:18. SEQ. ID N0:21, SEQ. ID NO:23. SEQ. ID NO:26. SEQ. 

ID NO:29, SEQ. ID NO:31,. SEQ. ID NO:33, and SEQ. ID NO:35; under conditions where it 
does not hybridize with SEQ. ID NOS: 1. 3. 5. 7, or any other DNA from a human cell. 

6. The Isolated polynucleotide of daim 5, wherein the linear sequence is at least 100 
30 consecutive nucleotides 

7. An isolated polynucleotide comprising a sequence of at least 40 consecutive nucleotides that 
hybridizes with an RNA having a sequence selected from the group consisting of SEQ. ID 
NO:15, SEQ. ID N0:18, SEQ. ID NO:21, SEQ. ID NO:23, SEQ. ID NO:26. SEQ. ID NO:29. 

35 SEa ID NO:31». SEQ. ID NQ:33. and SEQ. ID NO:3S; under conditions where it does not 

hybridize with SEQ. ID NOS: 1, 3. 5, 7. or any other RNA from a human cell. 
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8. The isolated polynucleotide of claim 7. wherein the linear sequence is at least 100 
consecutive nucleotides 

9. The isolated polynucleotide of any of claims 2-8. wherein said linear sequence is contained in 
5 a duplicated gene or overabundant RNA in cancerous cells. 

10. The isolated polynucleotide of any of claims 2-8, which is a CH13-2a12-1 polynucleotide, and 
is contained in an encoding region for a protein or RNA molecule that controls cell 
proliferation. 

10 

1 1 . The isolated polynucleotide of any of daims 2-8, which is a CH14-2a16-1 polynucleotide, and 
Is contained in an encoding region for a protein with DNA or RNA binding activity. 

12. The isolated polynucleotide of any of claims 2-8, present In a recombinant plasmid deposited 
1 5 under ATCC Accession No. 96074 

13. The isolated potymicleotide of any of claims 2-8, present in a recombinant phage deposiled 
under ATCC Accession No. 97595. 

20 14. The isolated polynucleotide of any of claims 2-8, present in the XBCBT474 cDNA library 
deposited under ATCC Accession No. 97594. 

15. An isolated polynucleotide comprising a linear sequence of polynucleotides essentially 
identical to a sequence selected from the group consisting of SEQ. ID NO:15, SEQ. ID NO: 

25 18, SEQ. ID NO:21 , SEQ. ID NO:23. SEQ. ID NO:26, SEQ. ID NO:29, SEQ. ID NO:31 . SEQ. 

ID NO:33, and SEQ. ID NO:35. 

16. An isolated polypeptide comprising a linear sequence of at least 5 amino add residues 
identical to a sequence encoded by a polynucleotide selected from the group consisting of 

30 CH1-9a11-2, CH8-2a13-1, CH13-2a12-1, and CH14-2a16.1. 

17. An isolated polypeptide comprising a linear sequence of at least 5 consecutive amino adds 
identical to a linear sequence contained in a sequence selected from the group consisting of 
SEQ. ID NO:17, SEQ. ID NO:20, SEQ. ID NO:22. SEQ. ID NO:24, SEQ. ID NO:28, SEQ. ID 

35 NO:30, SEQ. ID Na32. SEQ. ID NO:34. and SEQ. ID NO:37: but not m any of SEQ. ID 

NOS:2.4.6.and8. 

18. The isolated polypeptide of daim 17. comprising a linear sequence of at least 1 5 consecutive 
amino adds at least 90% identical to a linear sequence contained in the selected sequence. 
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19. The isolated polypeptide of daim 17 or 18, wherein said linear sequence is encoded in a 
duplicated gene or overabundant RNA in cancerous ceils. 

20. The isolated polypeptide of claim 1 7 or 18, which is overexpressed in cancerous cells. 

21. The isolated polypeptide of claim 17 or 18, wherein the polynucleotide selected from said 
group is a CH1.9a11-2 polynucleotide, and the polypeptide is a transmembrane polypeptide. 

22. An isolated polypeptide comprising a linear sequence of amino acids essentially identical to a 
sequence selected from the group consisting of SEQ. ID NO: 17. SEQ. ID NO:20, SEQ. ID 
NO:22. SEQ. ID NO:24. SEQ. ID MO:28, SEQ. ID NO:30. SEQ. ID NO:32. SEQ, ID NO;34, 
and SEQ. ID NO:37; but not m any of SEQ. ID NOS: 2, 4, 6, and 8. 

23. An isolated polynucleoUde comprismg an encoding sequence for the polypeptide of any of 
claims 17 to 22. 



24. A monoclonal or isolated polyclonal antibody specific for the polypeptide of daim 22. 

25. A method of detecting gene duplication in cancerous cells, comprising the steps of 

a) reacting DMA contained in a dinical sample with a reagent comprising the 
polynucleotide of claims 2-8. said dinical sample having been obtained from an 
individual suspected of having cancerous cells; and 

b) comparing the anwunt of any complexes formed between the reagent and the DMA in 
the dinical sample with the amount of any complexes formed between the reagent and 
DMA In a control sample. 

26. A method of detecting overabundance of RNA in cancerous cells, comprising the steps of: 

a) reacting RNA contained in a dinical sample with a reagent comprising the 
polynucleotide of daim 2-8, said dinical sample having been obtained from an individual 
suspected of having cancerous cells; and 

b) comparing the amount of any complexes fonned between the reagent and the RNA in 
the dinical sample with the amount of any comptexes formed between the reagent and 
RNA in a control sample. 
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27. A method of determining gene duplication or overabundance of RNA in cancerous cells, 
comprising the steps of: 

a) amplifying DNA or RNA in a clinical sample with a primer comprising the polynucleotide 
5 of claim 2-8 to yield an amplified polynucleotide, said clinical sample having been 

obtained from an Individual suspected of having cancerous cells; and 

b) comparing the amount of polynucleotide amplified from the DNA or RNA with the 
amount of polynucleotide amplified from DNA or RNA from a control sample. 

10 28. A method of screening for cancer associated with a gene duplication In an individual, 
comprising the steps of: 

a) detemnining gene duplication in celte from the individuai according to the method of claim 
25; and 

b) correlating any gene duplication determined in step a) with an increased risk for the 
15 cano^. 

29. A method of screening for cancer associated with overexpresston of RNA in an individual, 
comprising the steps of: 

a) detemnining overexpression of RNA in cells from the individual according to the method 
20 ofclaim26;and 

b) correlating any RNA overexpression determined in step a) with an increased risk for the 
cancer. 

30. A method of screening for cancer associated with a gene duplicatton or overexpression of 
25 RNA in an indhridual. comprising the st^ of. 

a) d^ermining gene dupltoatk)n or overexpresston of RNA in celts from tiie individual 
according to the method of claim 27; and 

b) correlating any gene duplicatton or overexpresston of RNA determined in step a) with an 
Increased risk for the cancer. 
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31 . The method of any of claims 28-30. which is a screerang method for breast cancer. 

32. A diagnostic kit for detecting gene duplication or RNA overabundance in cells contained in an 
5 individual as manifest in a dinical sample, comprising a reagent and a buffer In suitabte 

packaging, wherein the reagent comprises the polynudeotKle of any of claims 2-8. 

33 A method for delecting altered protein expression in cancerous cells, comprising the steps of: 

a) reacting a polypeptide contained in a clinical sample with a reagent comprising the 
10 antibody of claim 24. said clinical sample having been obtained from an individual 

suspected of having cancerous cells; and 

b) comparing the amount of any complexes fbrnried between the reagent and the 
polypeptide in the clinical sample with the amount of any complexes formed between the 
reagent and a polypeptide in a control sample. 



15 



20 



34. A diagnostic kit for detecting a polypeptide present In a clinical sample, comprising a reagent 
and a buffer in suitable packaging, wherein the reagent comprises the antibody of claim 24. 

36. A host cell genetically altered by ttie polynucleotWe of any of claims 2 to 8 or claim 23. 



36. A metiiod of screening a phamnaceutical candidate, comprising the steps of: 

a) separating progeny of the cell of datm 35 into a first group and a second group; 

b) treating the first group of cells witti ttie phamnaceutical candidate; 

c) not treating ttie second group of cells witti ttie phamnaceutical candidate; and 
25 d) comparing ttie phenotype of ttie treated ceUs wtth ttiat of ttie untreated cells. 



37. A pharmaceutical preparation for use In cancer therapy, comprising ttie polynucleotide of 
claim 2 to 8 or claim 23. said preparation being capable of reducing ttie pattiofogy of 
cancerous cells. 

30 

38. A method for treating an individual bearing cancerous cells, comprising administering ttie 
pharmaceutical preparation of claim 37. 



39. A phamnaceutical preparation for use in cancer therapy, comprising the antibody of claim 24. 
sakl preparation being capable of reducing ttie pattiology of cancerous cells. 

40. A method for treating an individual bearing cancerous cells, comprising administering ttie 
phamnaceutical preparation of claim 39. 
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41. A pharmaceutical preparation comprising the polypeptide of claim 17 or 18 in an 
immunogenic form, and a pharmaceutically compatible exdpient 

5 42. A method for treatment of cancer, comprising administration of the phamnaceutical 
preparation of daim 41 . 



43. A method for obtaining cDNA corresponding to a gene that is duplicated or overexpressed 
in cancer, comprising the steps of: 
10 a) supplying an RNA preparation from control cells; 

b) supplying RNA preparations from at least two different cancer cells; 

c) displaying cDNA corresponding to the RNA preparations of step a) and step b) such that 
different cDNA corresponding to different RNA in each preparation are displayed 
separately; 

15 d) selecting cDNA corresponding to RNA that is present in greater abundance in the 

cancer cells of step b) relative to the control cells of step a); 
e) supplying a digested DNA preparatton from control cells; 
0 supplying digested DNA preparations from at least two different cancer cells; 

g) hybridizing the cDNA of step d) with the digested DNA preparations of step e) and step 
20 f); and 

h) further selecting cDNA from the cDNA of step d) corresponding to a gene that is 
dupficated in the cancer cells of step f) relative to the control cells of step e). 



44. The method of daim 43, wherein the two different cancer cells used to supply RNA in step 
25 b) share a duplicated gene in the sanr^e region of a chronDosome. 

45. The method of daim 43, wherein RNA preparations from at least three different cancer 
cells are supplied in step b). 

30 46. The method of claim 43, wherein the three different cancer cells used to supply RNA in 
step b) share a duplicated gene in the same region of a chromosome. 

47. The method of claim 43. wherein the control cells of step a) are uncultured. 

35 48. The method of daim 43, further comprising supplying a digested mitodiondrial DNA 
preparation; hybridizing the cONA of step h) with the digested mitochondrial DNA 
preparation; and further selecting cDNA from the cDNA of step h) corresponding to genes 
that do not hybridize with the digested mitochondrial DNA preparation. 
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49. The method of daim 43, further comprising the steps 
i) supplying ah RNA preparation fifom control cells; 

j) supplying RNA preparations from at feast two different cancer cells; 
5 k) hybridizing the cDNA of step h) with the RNA preparations of step i) and step j); and 

I) further selecting cDNA from the cDNA of step h) corresponding to RNA that is present in 
greater abundance in the cancer cells of step J) relative to the control cells of step I). 

50. The method of claim 49, wherein the gene to which the cDNA corresponds is not 
^0 duplicated In at least one of the cancer cells used to supply the RNA in step j) relative to 

the control cells of step e). 

51. The method of daim 43, wherein the two different cancer cells used to supply the RNA 
preparations in step b) are breast cancer cells. 

15 

52. The method of claim 43, wherein the two different cancer cells used to supply the RNA 
preparations in step b) are from a convnon type of cancer, wherein the type of cancer is 
selected from the group consisting of lung cancer, glioblastoma, pancreatic cancer, colon 
cancer, prostate cancer, hepatoma, and myeloma. 

20 

53. The method of daim 43, wherein the two different cancer cells used to supply the digested 
DNA preparations in step f) are breast cancer cells. 

54. The method of daim 43, wherein the two different cancer cells the digested DNA 
25 preparations in step f) are from a common type of cancer, wherein the type of cancer is 

selected from the group consisting of lung cancer, gHoblastoma. pancreatic cancer, colon 
cancer, prostete cancer, hepatoma, and myeloma. 

55. A method for obtaining cDNA corresponding to a gene that is deleted or underexpressed in 
30 cancer, comprising the steps of: 

a) supplying an RNA preparation from control cells; 

b) supplying RNA preparations from at least two different cancer celts that share a deleted 
gene in the same regim of a chromosome; 

c) displaying cDNA corresponding to the RNA preparations of step a) and step b) such that 
35 (fitferent cDNA corresponding to different RNA in each preparation are displayed 

separately; and 

d) selecting cDNA corresponding to RNA that is present in lower abundance in the cancer 
cells of step b) relative to the control cells of step a). 
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56. The method of claim 55, further comprising the steps of: 

e) supplying a digested DNA preparation from control cells; 

f) supplying digested DNA preparations from at least two different cancer cells; 

5 g) hybridizing the cDNA of step d) with the digested DNA preparations of step e) and step 

f); and 

h) further selecting cDNA from the cDNA of step d) conrespondrng to a gene that is deleted 
in the cancer cells of step f) relative to the control cells of step e). 



10 57. A method for characterizing a gene that is duplicated or has altered expression In cancer, 
comprising obtaining cDNA corresponding to the gene according to the method of any of 
claims 43-56, and then sequencing the cDNA. 



58. . A method of screening a candidate drug for cancer treatment comprising obtaining cDNA 
1 5 corresponding to a gene that is duplicated or has altered expression in cancer according to 

the method of any of claims 43-56, and con(9)aring the effect of the candidate drug on a 
cell genetically altered with the cDNA with the effect on a cell not genetically altered with 
the cDNA. 
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Figure 3 
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Figure 4 
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Figure 5 
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Figure 6 
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Figure 7 



4* strand (sense) sequence (S*-->3*) 

1st base 



1. 


pchl-t7-lf 


1123 


CG6 


GAG 


GTT 


TCA 


GAT 


CGA 


C 


2. 


pchl-t7-2f 


1437 


GCG 


CTG 


CAA 


GTA 


CAA 


AAT 


TG 


3. 


pchl-t7-3f 


1729 


TCT 


AAA 


GTC 


CAA 


GAC 


CAA 


GG 


4. 


pchl-t7-4f 


1987 


CAG 


AAA 


TTA 


TGG 


TTT 


CTA 


CC 


5. 


pchl-t7-S£ 


2266 


CaG 


GAA 


GAG 


GAG 


GGA 


TAA 


C 






















6. 


pchl-sp6-3fb 


2684 


AAA 


CAT 


ACA 


CAA 


TAA 


ACA 


C 


7. 


pchl-sp6-2rb 


2966 


TTG 


GCA 


GCG 


ACT 


GTA 


TTT 


G 


8. 


pchl-sp6-lrb 


3283 


CCT 


GAT 


TTT 


ATA 


GAA 


GCC 


CC 




Btrand (anfcisanse) 
















9. 


pchl-sp6-lf 


3302 


G6G 


GCT 


TCT 


ATA 


AAA 


TCA 


GG 


10 


. pchl-sp6-2f 


2987 


ATT 


CAA 


ATA 


CAG 


TTG 


CTG 


C 


11 


. pchl-sp6-3f 


2705 


TTA 


GTG 


TTT 


ATT 


GTG 


TAT 


G 


12, pchl-sp6-4f 


2458 


AGT 


GTT 


CAT 


TTC 


CAG 


T6A 


G 


13, pchl-Bp6-5f 


2066 


CTT 


TGT 


TCT 


TGG 


ACT 


TTA 


G 


14 


. pchl-t7-3fb 


1748 


CCT 


TGG 


TCT 


TGG 


ACT 


TTA 


G 


15 


. p.chl-t7-2rb 


1445 


AAT 


TTT 


GTA 


CTT 


GCA 


GCG 


C 


16 


. pchl-t7-lrb 


1141 


GTC 


GAT 


CTG 


AAA 


CCT 


CCC 


G 


17 


. CHla 


1063 


GTG 


CCT 


GTA 


GCA 


ACT 


GGA 


TGG 


18 


. CHlb 


1079 


GTC 


ATG 


TTG 


GTC 


AGC 


TGT 


GCC 
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Figure 8f A) 



1 


GAATACATAT 


51 


CAGCCGAACT 


101 


CACCCTTACT 


151 


CTGAGTOGAG 


201 


TGTTCTGGGT 


251 


ACACTGTAGA 


301 


CAGTCTCTTC 


351 


AGAAG?rATCT 


401 


TGATTCCCCA 


451 


TCTGAGAGCT 


501 


TAAAGTTAAT 


551 


TGCAAATTTT 


601 


GCCACTGTAC 


651 


CACAGGAAAG 


701 


AAGTAAAAGA 


751 


CAGAGGACAG 


801 


AGGATATGCT 


851 


TATTTATGAG 


901 


CICAGTGGTC 


951 


GGAAGAAATG 


1001 


CTTCAAGAAT 


1051 


TTGCTACAOG 


1101 


AGCAACAGTA 


1151 


TTGTCATATC 


1201 


CAGCX3TTGTC 


1251 


TCCTAAAAGT 


1301 


AT6ATATGAA 


1351 


TCTCTACAGT 


1401 


AGAACCCCTC 


1451 


AAATTGAAAA 


1501 


GCCAATGGCG 


1551 


TTCTAATATG 


1601 


AAGGAAGCTC 


1651 


ATTTCAGCTT 


1701 


TGAGAAGAGG 


1751 


AATTGATAAA 


1801 


CATGACATAA 


1851 


TACAGCAGTC 


1901 




1951 


TTGGGGGAGG 


2001 


TCTACCTTTT 


2051 


CTACAGTTTT 


2101 


TTATAAAGAT 


2151 


GAAGCCCAGT 


2201 


TTAGcrrrGT 


2251 


CTATTTATAA 


2301 


'mxn'riTAT 


2351 


TTGAGGOGGT 


2401 


GATCCTAAAA 


2451 


TGAACACTTA 


2501 


CICITOITAT 


2551 


AACTCAATSA 



ATAAATGGTG TTCAGTTAGA 
GCTTTGAGTA AAGGAAAAGA 
ACTTCCTGCG GAATCA6TAG 
AATTGGAAAA TACGAATATA 
GATTTAAGTA GTAGTATGCA 
TGCAGTKSAA CTTGAACCAA 
TTTTAGATAT TACCCCAGAA 
GAGTCTGTTG AATATGAGGC 
AGAGAGTTCT GTTGAGATCG 
TTAGTTCTAT AGAGAAACCA 
GAGTTAATGG ATAATATTAT 
CACAAAGCTG TCTGAAACAA 
CCGACAATGA AGATGGGGAA 
CAAACTTTGA TTTCTGTTGT 
AGAAGAACAG TCTCCAGAAG 
CTACAGATTT TTATGCTGAA 
AATGGAAATC TTGTACATGG 
ACTTAATAAT CGTATTAAAG 
GCTATCTGGA GGAC3CTTAGC 
CAAAAGGCTT TCAACAAAAC 
AGCAGAGGAG CAGGATCAGC 
CACAGCTGAC CAACATGACA 
GCAGAATZGA AACGGGAGGT 

T i ' iwx ' xu ' rr tctgttgtct 

GAAATACnC TCAATTTGAT 
AATCAGXATC CAAGCCCTAA 
TTTGAAAAGA AGAACTTCAT 
TAACTGGCAA AGAAGTAGAC 
AAGTTTTCTC CAGAAAAGAA 
AATTGAGACC ATAAAGCCTG 
ACATAA AAGG AAGAAAGCCC 
GGAGAAGTIT ATCACTCTTC 
AGAAACrrCA TCACAGTCAG 
GCACAAGTCT GTGCAATGGA 
GCTTTAAAAC GAAGACGATC 
AACTCTAATA CAGACTAAGT 
TCAAAGGAAA CAAAGAGAIC 
TCGGGACATA TCTAAAATTA 
TTGTTGTTCT TTGAAGAACA 
GAGAAAATAT TAATGGGAAA 
TAAAAAGTAG AOGGGATTCT 
A CAAAGCT GA TXaiCTTCCTA 
G TriTn XAC AAGATTAATT 
TCCTTAGGTO GGATAGGAAT 
TCCTATTTCT TGCACCTTCC 
TG CCACTGG A AGAGGAGGGA 
AACTTTGTTA GGTTnTGAA 
CTCTCCCTGA AGCTCAGGAG 
ACTTGCCAAC TGGATCTTTG 
ATGG A ATTT T TAAGTCTGTT 
TTTCACTXAT TCAGGCTG6A 
OGAAAAAATC CCTACAGGAT 



GTTCCTCTTT ATOGGCAGCG 
TTATCTTGro TTAGCTCAAC 
ATGTrrCAGT ATTGCAACCT 
GAAAGGGAAG CTGAAACTGT 
CCAGGATGAC TTGGTGAATC 
GCCATTCTCA AACTCTTTCT 
ATCAATCCCT TGCCTAAAAT 
AGGACATATA CXTITCACCAG 
ATAATGAAAC AGAACAAAAG 
TCTATTACCT ATGAAACAAA 
AAAAGAAGAT ATGAACICCA 

tagigcx:acc aataaataca 
gccaaaatqa atatagctga 
ggattcttct tcattacctg 
atgcc ctxtr gagagggtta 
ttgcaaaatt ctacagatct 

ATCAAACCAA AAGGAGTCAG 
CCTTAGAAGT TAACATGTCT 
CAAAGGTACC GAAAACAAAT 
AATCGTGAAA CTTCAGAATA 
GGCAAA CTGA AGCXATCCAG 
CAGCTTGTTT CAAATTTATC 
TTCAGATCGA CAAAGCTATC 
TGGGACTGAT GCnTGTATG 
GGAGATTATA TTTCAAAACT 
AAGGTGTTTC TCTTCCTATG 
TCCCACTCAT GAGATCCAAG 
CCAAATGATT TGTACATTGT 
GAAGAAGCGC TGCAAGTACA 
AAGAACCATT GCACCCCATA 
TTTACGAACC AGAQAGATTT 
TEATAAAGCT CXTTCCA TCTG 
AAGAGTCCTA TnTTGTGGC 
CAGTCTCAAA AGACAAAAAC 
TAAAGTCCAA GACCAAGGAA 
CGGGATCATT GCCGAGCCTG 
ACCGTGGGAA GATITGGTGT 
ATTCAACTTT ICATACAGAA 
GTCTGTAGTA TTTGAAGGGT 

ggcatzx:aga aattatggtt 

GCTCAATCTT GGTTAATGAG 
TAAGGACAAT GGTAGACATT 
ACTGGGACAA AAGTAATTTG 
GAAAGCCTAA ACCTCTTCCT 
CATATTTATG TGCCTTTTGT 
TAACTTITTC TGTTATTTGA 
GCTGCAAACA CTACAATGCT 
TGTGGATCAG ACAGTCTAAA 
TTTAGC AAAC TCACTGGAAA 
CTG TTAGGTA GATOGTCATG 
TTRCTTCIT A CTTAGTTACT 
CTTTrTTIGC AAACAACTCA 
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Figure B(B) 



2601 TATATGCAGA CA AATTTT TG ACAAAOTCAC CTTTTAAACA CGACGTTAAC 

2651 CGATTTCTGA AGGTmCTT TAGCTTACAT TTTAAACATA CACAA13VAAC 

2701 ACTAATCCTC CAAACTTTCA CIGTTTTrAT TAGTATCAAT ATAAAATTTO 

2751 AAGG-mWC CAATTAGTAC AAGTCTCATG ATATAATCAC AQCCTOCATA 

2801 CATATgCACA GATCCAGTTA G?1GAGTTTCT CAAGCTTAAT CTAATTGGTT 

2851 AAGTCTAAAG AGATTAITAT TCCTTGATC3T TTCCTITCTA TTCGCTACAA 

2901 ATGTGCAGAG CTAATACATA TGTGATCTCG ATGTCTCTCT C T iTlTi ' l ' lT 

2951 GTCTTTAAAA AATAATTGGC AGCAACTGTA TTTGAATAAA ATCATTTCTT 

3001 AGTATGATTG TACAGTAATO AATGAAAGTG GAACATXyiTT CTTTTTGAAA 

3051 GGGAGAGAAT TGACCATTTA ITOTTGTGAT GTITAAGITA TAACTTATTC 

3101 AGCACrr rTA GTAGTGAIAA CTGTmTAA ACTTGCCIAA TACCTlTCTr 

3151 GGGTATTGTT TC TAAT GTGA CTTATTTAAC GCXTTCTITG TTTCTTTAAG 

3201 TTGCTQCTTT AGGTTAACAG CGICTTTTAG AAGATTTAAA TTTCTTTCCT 

3251 GTOTGCACAA TTAGCTATIC AGAGCAAGAG GGCCTGATTT TATAGAAGCC 

3301 CCTTGAAAAG AGGTXXAGAT GAGAGCAGAG ATACAGTCAG AAATTATCTC 

3351 ATCTGTGTGT TGTQGGAAGA GAATTTTCAA TATGTAACTA CGGAGCTCTA 

3401 GTGCCATTAG AAACTGTGAA TTTCCAAATA AATCTGAACA CTTGTCTTTA 

3451 TT 
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Figure 9 



1 EYIYKWCSVR VALYRQRSRT 

51 LSGEUStnm EREAFIWLG 

101 QSLLLDITPE INPLPKIE\7S 

151 SESFSSIEKP SHYEINKVN 

201 ATVPEMEDGE AKMNIWITaC 

251 QRTATESFYAE LQ^STDLOTA 

301 LSGRYLEELS QRYKKQMEEK 

351 LLQAQLTOWr QLVSrJLSATV 

401 QRCRNTSQFD GCYI5KLPKS 

451 SLQLTGKEVD PNDLYIVEPL 

501 ANGDIKGRKP FTO QRDFSN M 

551 ISACTSLCNG QSQKXKTOECR 

601 HDIIKGNKEI TVGIFGVTAV 
€51 OlQfUfJFLBF 

701 L^RCFFTRLI TGTKV7IWKPS 

751 LTIMPLEEEXS •LFLLFDETY 

801 DPKNLPTGSL FSKLTC3NEHL 

851 NSMRKKSLQD LFLQWDICR 

901 TNPPNFHCPy •YEYKI^RFG 

951 KSKEIIIP^C LLCIGYKCAE 

1001 SMIVQ»*MKV EHVSF-KGEaJ 

1051 GYCL'CDLFN AFFV7CLSCCF 

1101 P*KEV(9fRA£ IQ«E1H*SVC 



ALSKGKDYLV LAQPPLLLPA ESVDVSVLQP 
DLSSSMHQDD LVNHTVTDAVE XaEPSHSQTLS 
ESVEYE AGHI PSPVIPQESS VEICNETEQK 
EUSDtaiFED MNSMQ IFTK L SETIVPPINT 
QTLISWDSS SLPEVKEEEQ SPEEALLRGL 
tISMLVHGSNQ KESVFlSai^ RIKALEVNMS 
QKAFNKTIVK LQNTSRIAEE QDQRQTEAIQ 
AELKREVSDR QSVLVISLVL CWLGIMXM 
NQYPSPKRCF SSYDDMNLKR RTSFPUMRSK 
KFSPEKKKKR CKYKIEKIET IKPEEPLHPI 
GE\nrHSSYKG PPSEGSSETS SQSEESYFCG 
AUCHRRSia^Q DQQKLIKTLX QTKSGSLPSL 
S6HI*N*LNF SYRRLFCCXS UCMSL^YUCG 
•KVDGIVIHL G-^ATVIiQS* SLPIRIMVDI 
SLGGIGMKA* TSSFSFVPIS CTFPVLCAFC 
NFVRFLKLQT LQCFEXSVCA* SSG\A^RQSK 
MEFL SLFC ^V DGDftL VIFTy SGWITSYLVT 
QIFDKFTF»T RR-PICEGFL •LTF'TYTIN 
QLVQVS-YNH SLHTVTAQIQL VSLSSLI^LV 
VIHM-CRCIiC LFFCL^KIIG SNCI«IK*FL 
•PFIWMFTCL •LIEHF**** LFLNLPNTFL 
RLTACFRRFK FLSCLHN«U^ RAKGPDFIEA 
CGKRimi«L RSCSAIRNCE FPNKSEKLSL 



1 EYIYKWCSVR VTOiYRQRSRT 

51 LSGELENmr EREAETWLG 

101 QSLLLDITPE INPLPKIEVS 

151 SESFSSIEKP SITYEINKVN 

201 ATVPDNEDGE AKMNIAZXEAK 

251 QRTATDFYAE LQ^SSTOCiSYA 

301 LSGRYLEELS QRYRKQMEEM 

351 LLQAQLTOMT QLVSNLSAOV 

401 QRCRNTSQFD GDYISKLPKS 

451 SLQLTGKEVD PNDLYIVEPL 

501 ANGDIKGRKP FTOQRDFSNM 

551 IS^CTSLCMG QSQKTECXEKR 

€01 HDIIKGNKEI TVGTtUVTAV 



ALSKGKDYLV LAQPPLLLPA ESVDVSVLQP 
DLSSSMHQDD LVNHTVDAVE LEPSHSQTLS 
ESVEYEAGHI PSPVIPQESS VEIENETBQK 
ELMDNIIKED MNSMQ IFTK L SETIVPPINT 
QTLISWDSS SLPEVKEEEQ SPEDALLRGL 
NGNLVHGSMQ KESVFMRU3N RIKALEVNMS 
QKAFTOCnVK LQNTSRIAEE QDQRQTEAIQ 
AEUCREVSDR Q SYLVISLVL CWLGLMLCM 
NQYPSPKRCF SSYTOMNLKR RTSFPIi^RSK 
KFSPEKKKKR CK^KIEKIET IKPEEPLHPI 
GEVYHSSYKG PPSEGSSETS SQSEESYFCG 
ALKRRRSKVQ DQGKLIKTLI QTKSGSLPSL 
SGHI 
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Figure 10 



* strand (sense) sequence (5 ' — >3 * ) 



1. 


pch8- 


-sp6- 


•If 


369 


GCT 


AAG 


CCA GAG 


CTA 


CAG 


G 


2. 


pch8- 


-sp6- 


•2f 


677 


tCT 


GAT 


CTT 


CTG 


cm 
















(CTC) 














3. 


pch8- 


-Ifa 




1238 


TCT 


GAA 


CTG 


CCT 


GAG 


AGA 


c 


4. 


pch8- 


-2f 




1462 


CCA 


AAT 


GGG 


AGC 


ATT 


ACA 


AG 


5. 


pch8- 


-3f 




1745 


TCA 


TCA AAT 


GAT 


CAG 


AAC 


c 


6. 


pch8- 


-4f 




1995 


ATT 


CTG 


GAG AGT 


TGG 


TAT 


CC 


7. 


pch8-5f 




2277 


GGA 


ATA 


AGG 


AAA 


GAG 


CTT 


G 


8. 


pch8- 


-6f 




2559 


TCC 


ACT 


CAT 


ATT 


CCA 


ATA 


CC 


9. 


pchS- 


-5rb 




2849 


CCT 


GAG 


AGA 


CAG 


AAC 


TGT 


TC 


10 


.pch8- 


-4rb 




3090 


GGA 


CCC 


TTC 


ACT 


TCC 


TTA 


C 


11 


.pch8- 


-3rb 




3370 


GGC 


CAC 


CAC 


TTG 


TCC 


TGG 


G 


12 


.pch8 


-2rb 




3517 


CAG 


AAC 


AGT 


GCT 


CTA 


ACT 


G 


13 


.pch8 


-Irb 




3970 


GTA 


CTG 


CCT 


CTC 


TTA 


AAT 


G 



- strand (antisense) sequence (5 • — >3 ' ) 



14. pch8-2r 

15. pch8-3r 


3617 
3360 


CAG 
CCC 


TTA 
AGG 


CAG 
ACA 


CAC 
AGT 


TGT 
GGT 


TCT 
GGC 


G 
C 




16. pchB-4r 

17. pch8-5r 


3140 
3849 


GTA 
GAA 


AGG 
CAG 


AAG 
TTC 


TGA 
TGT 


AGG 
CTC 


GTC 
TCA 


C 
GG 




18. pch8-6r 

19. pch8-5fb 


3563 
2277 


CTT. 
CAA 


GGG 
GCT 


TAT 
CTT 


TGG 
TCC 


AAT 
TTA 


ATG 
TTC 


AG 
C 




20. pch8-4fb 

21. pch8-3fb 


1999 
1746 


ATA 
TGG 


GGA 
TTC 


TAC 
TGA 


CAA 
TCA 


CTC 
TTT 


TCC 
GAT 


AG 
6 




22. pch8-2fb 

23 . pch8-lfb 


1462 
1238 


CTT 
GTC 


GTA 
TCT 


ATG 
CAG 


CTC 
GCA 


CCA 
GTT 


TTT 
CAG 


GG 
A 




24. pch8-fb-lf 

25. pch8-fb-2f 


941 
612 


GTA 
CAA 


GAG 
TGA 


AAT 
CCA 


CAC 
GTA 


GTA 
GCA 


CAG 
TAA 


C 
C 




26. CH8-3670 

27. CH8a 


3891 
387 


CAG 
CCT 


CAT 
GTA 


TTA 
GCT 


AGA 
CTG 


GAG 
GCT 


GCA 
TAG 


G 

CAT 


CC 


28.CH8b 


510 


CCC 


CTT 


CAT 


TGA 


GAT 


CAT 


CTA 


G 
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Figure 11(A^ 



1 GTGCX3CCGTC5 GCGCGGCCCG GCTGACAC3GT TCTTTAATGG AGGAGCCAAT 

51 CTCTCTGCAC ACXrrGGTTTC ATCTAATAAT ATACAGRCAC CAGCTCTCAG 

101 GCCAGTTAAT CATC CCCAGT GTCCAGGCAC AGAGTAfilCG GTCCGCCTCA 

151 CAATOTTOGA CTITCTAGCX: GAGftAC3^ACC TCTGTQGCCA AGCAATCCTA 

201 AGGATTGTIT CC TO TO G IAA TGCCATCATT GCTGAACITT TGAGACTCIC 

251 TGftGTTTATT CCTGCTGTCT TCAOGTTAAA AGACAGAGCT GATCAACAGA 

301 AATAlGGftGA TATCATATTT GATTTCAGCT ATTTTAAOGG TCCAGAATTA 

351 TGGGAAAGCA AACTGGATGC TAAGCCAGAG CTACAGGATT TAGATCAAGA 

«01 ATTTCGTGAA AACAACATAG AAATTGTGAC CAGATTTTAT TOAGCATrrc 

451 AAAGTGTACA TAAATATATT GTAGACTTAA ACAGATATCT AGATCATCTC 

501 AATGAAGGGG TTTATATTCA GCAAACCTTA GAAACTCTCC TTCTCAATCA 

551 AGATGGAAAA CAACTTCTAT GTGAAGCawrT GEACTIATAT GGAGTTATOC 

601 TACTSGT CAT TGACCAAAAG ATTSAAQGAG AAGTCAGAGA GAGQATCCTC 

651 GmCTEACT ACCGATACAG TGCTQCTOGA 'ISJTll.'l U .- m ATTCAAATAT 

701 GGACGATATT TGTAAGCTGC TTCGAAGTAC AGGTTATICT AGCCAACCAG 

751 GTGCCAAAAG ACCATCCAAC TATCCCGAGA GCTATTTCCA GAGAGTCCCT 

801 ATCAAOGAAT CCTTCATCAG TATGOICATT GGTCGACTGA GATCTCATCA 

851 TATTTACAAC CAGGTCTCAG CGTATCCTTT GCCGGAGCAT CGCAGCACAG 

901 CCCTGGCAAA CCAAGCTGCC ATGCTGTACG TGAT7CTCTA CTrKSAGCCT 

951 TCCATCCTTC ACACCCATCA AGCAAAAATG AGAGAGATAG TCGATAAATA 

1001 CTTTCCAGAT AATTGGGTAA TTAGTATTTA CaOXXSGGATC ACSUSmATC 

1051 TAGTAGATGC TTGGGAACCT TACAAAGCTG CAAAAACTCC TTTAAATAAT 

1101 ACCCTGGACC TTICAAATGT CAGAGAACAG GCAAGCAGAT A«3CTACTCT 

1151 CAGTGAAAGA GTGCATGCTC AAOTGCAGCA ATTTCTAAAA GAAGGTrATT 

1201 TAAGGGAGGA GATGGTTCTG GACaATATCC CAAAGCTTCT GAACTCCCTG 

1251 AGA6MTCCA ATGTTCCCar CCGATGGCTG ATGCTICATA CAGCAGACTC 

1301 AGCCTGTOAC CCAAACAACA AACGCCTTCG TCAAATCAAG GACCAGATTC 

"51 TAACAGACTC TCGGTACAAT CCCAGGATCC TCTTCCAGCT GCTCTTAGAT 

1401 ACTGCACAAT TTGAGTTTAT ACTCAAAGAG ATGTTCAAGC AAATCCnTC 

1451 AGAAAAGCAA ACCAAATQGG AGCATTACAA GAAAGAGGGT TCGQftGCGGA 

1501 TQACTGAQCT TGCTGATGTC rPPTCAGGAG aX3AAACCXXT AACOVGAGTO 

1551 GAGA AAAAT G AAAACCTTCA AGCTTGGTrC AGAGAGASCT CAAAACAAAT 

1601 AlTCTCirrA AATIATQATG ATTCTACTGC TGCGGQCAGA AAAACTCTAC 

1651 AACTGATACA AGCTTTGGAA GAGG1TCAA6 AATTCCACCA GTTOGAATCC 

iZzt AATCTGCAAG TATGTCAGTT TCTTGOCGAT ACTCGAAAGT TTCITCATCA 

1751 AATGATCAGA ACCaTTAACA TEAAAGAGGA GGTTCTGATC ACAATCCAGA 

1801 TCGTTCGGGA CCTTTCTTTC GCTTGGCAGT TGATTGACAG TTICACATCC 

1851 ATCATGCAAG AAAGCATAAG GGTAAATCCA TCCATCGTTA CTAAACTCAG 

1901 AGCTACCITC CTAAAGCTTG CCTCTCCCCT CGATCTCCCC ^CTTOsS 

1951 TTAATCAGGC AAATCGCCCC GACCTGCTCA GCGTCTCACA GlSwrTATICT 

2001 GGAGAGTTGG TATOCTATGT GAGAAAAGTT TOGCAGATCA TCCCaGAAAG 

2051 CATGTTTACa TCTCITCTAA AGATCATAAA GCTTCAGACC CACGACATTA 

2101 TOSAAGTGCX: TAOXQCCTG GACaAAGACA AGCTGAQGGA CTATCCTCAG 

2151 CTAQGCCCAC GMAOGAGGT TCCXAAGCTT ACTCATCCTA TITCCATITr 

2201 TACTOAAGGC ATCTTAATGA TGAAAACGAC TTTGGTTGGC ATCATCAAGG 

2251 TOGATOCAAA GCAGITSCTG GAAGATQGAA TAAGGAAAGA GCrTOTCAAG 

il^} SSH^'^S?^ TTQCCCTGCA TAGGGGACTG ATATTCAAOC CTCGAGCCAA 
GCCAAGTGAA TT GATGC CCA AGCTGAAAGA GTTCGGAGCG ACCAIGGAIG 

2401 GAITCCATCG TTCTTTTGAA TACATACAGG ACTATGTCAA CATTIATCGT 

2451 CTGAAGATTT GGCAGGAAGA AGTATCTCGT ATCATAAATT ACAACGXGGA 

2501 GCAAGAGTOT AATAACITIC TAAGAACGAA GATTCAAGAT TGGCAAAGCA 

2551 TOTACCAGflC CACOCATATT CCAATACCCA AGTTOACXXX: TOKaSAOGAG 
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Figure 11fB^ 
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3901 
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3951 
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TTATTOGTCG ACTCTGCAGA 
ACAOXSnCACA TAGACCAGCT 
G GAAG TGACC AGCAGCCGCX: 
CCTTTOGTCT AAATGGCTTA 
GAGTTACAGA ATTTCCTCAG 
AACTCTTCAG QACACTTTAA 
AAAGTATZGT CGCAAATICA 
ACACAGAAGA TTTOGACTOC 
GATGCAGATT CTGAGGCAAC 
GGTTTGATTC TAAACATCTG 
CTCCTAGCAG ACATTQAAGC 
CAAAGAAGAT AACACACTTT 
CTGGCATTCA CAACCCACTG 
CCCTATTTTC CAATTCTAAA 
ACTTCAATAC AACAAAAATC 
CXXSTTGATTG GCX^CCACTT 
TTCCATTCCC GGTACACCGA 
CTGCTCCACG GTGGAGCAGT 
CAGATGiTGT GGGTCCCXriT 
AAGCTACCCA GGAGGGTTGC 
TGAGTTCAGA ACAGTGCTGT 
ATTGTCCTTA GATCTTCCCA 
AACTCAGTTG CTCATACAAC 
CAG ACGTT AT GAGTAAGATA 
ATTGTTTAAA TCATCGTATT 
CACAl'lTnXi TACIGC XrrCT 
AATCCAnTA GnTTATGTT 
TCAGTAAAAT AGTATTACTA 



GAAATCCTGC GGATCACAGA 
GAACACTTOG TATGATATGA 
TCTTCTCAGA AATCCAGACC 
GACAGGCTTC TGTGCriTAT 
TATGTTTCAG AAAATTATCC 
AAACCCTCAT GAATGCIGTC 
AATAAAATTT AITITTCOGC 
GTATCTCGAG GCTATAATXSA 
AGATTGCXAA TGAATTAAAT 
GCAGCTGCTC TGGAGAATCT 
CCACTATCAG GACCCTTCAC 
TATATGAAAT CACAGCCTAT 
AATAAGATAT ACATAACAAC 
CTTTCTATTT TTGATCGCTC 
ICSGGAATGGT CTGCCGAAAA 
GTCCTGQQAC TGCTCACTCT 
GCAGCTCCTG GCGCTGATTG 
GTACAAGCCA GAAGATACCT 
CTGTTCCTGG AGGATTATGT 
TGAAGCACAT GTGCXrrAATT 
AACTGTTITT CCTACTTCTT 
CXrATCACAAA TGAAITTGAA 
TGCATriTiT CTC3TCTATTA 
TATCTCAOXSG CATrAGTTAA 
ACAT GCAATT TATATCAGAT 
CTTAAATGCT GAATGTAACT 
CTAAAGAACT ATITOTGCAA 
GT 
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Figure 12(A) 



APWRSPADRF fNQGANLSAH LVSSNNIQfTP ALRPVNHPQC PGTE»SVRLT 
MLDFLAENNL CGQAILEIIVS CGNAIIAELL R LSEF IPAVF RUCDRADQQK 
YCTIIFDFSY FKGPBUWESK IJDAKPELQ13L CEEFRENNIE IVTOFYIAFQ 
SVmWTVDLN RYLDDLNEGV YIQQTICTVL LNEDC3KQLLC EALYLYGVML 
LVIDQKIBGE VRERMLVSYY RYSAARSSM) SNMDDICKLL. RSTGYSSQPG 
AKRPSNYPES YFXJKVPINES PISMVIGRLR SDDIYNQfVSA YPLPEHRSTA 
LANQAAMLYV ILYFEPSIM THQAKMREIV DKYFPDNWVI SIVMSITVNL 
VEWWEPYKAA KTAUOOTLDL SNVREQASRY ATVSERVHAQ VQQFIJCBC3YL 
REEMVLCNIP KLmCLRDCN VA1RWU1LHT ADSACDPNNK RLRQOTQIL 
TDSRYNPRIL FQLLUTTAQF EFILKEKFKQ HLSEKQTKWE HYKKEGSERM 
TELADTFSGV KPLTRVEKNE NLQAMFREIS KQILSUmDD STAAGEOTVQ 
LIQALEEVQE FHQLESNLCfV CQFLAOTRKF LHQMIRTINI KEEVLITOQI 
VQJIiSFAWQL IDSFTSIMQE SIRWNPSMVT KLRATFliOA SALDLPLLRI 
NC3ANRPDLLS V9QYYSGELV SYVRKVLQII EESMFTSLLK HKL gTHDII 
EVPTRIDKDK LREYAQLC3PR YE7AKLTHAI SIFTiXSIUM RTTLVGIIKV 
DPKQLLEDGI RKELVKRVAF AIHRGLIFNP RAKPSEIMPK LKEl/SATODG 
FHRSFEYIQD YVNIYGLKIW QEEVSRIIOT NVBQBCNNFL RTKIQEWQSM 
YQSTOIPIPK FTPVDESVTF IGRLCREILR ITDPKMTCHI DOLNTVWEMK 
THQEVTSSRL FSEIQTTLCT FGLNSLDBLL CFMZVKEI^ FLSMPQKIIL 
KDRTTQOTLK TLMNAVSPIK SIVMJSNKIY FSAIAKTQKI WTAYLEAIMK 
VGCWQILRQQ IANEI2IYSCR FDSKHIAAAL ENLNKALLAD lEAHYQDPSL 
PYFKEmriiL YEITAYLEAA GIHNPLNKIY ITTKRLPYFP IVNFLFLIAQ 
LPKLQYNKNL GMVCRKPTDP VDWPPLVLGL LTLLKQFHSR YTEQLLALIG 
QFICSTVEQC TSQKIPEIPA DWGAIUl-E DYVRYTKLPR KVAEAHVPNF 
IFDEFRTVL* UFFLLl^QWKD CP*IFPP9QM NLKMKRNSVA HTTAFFLSIM 
GNIRRYE^DI SHGIS-YN*Y CLNHGITCNL YQIKAEHIFV LPLLNAECNC 
YV*IKLVLCS KELFVQUQIF SKIVLL 



Ffaure 



MLDFLAmJL CGQAILRIVS COOaiAELL RLSEFTPAVF RLKDRADQQK 
YGDIIFDFSY FTOPELWESK LDAKPELQDL DEEFRENNIE IVTRFYLAFQ 
SVHKyiVDLN BYLDDIUBGV YIQQTLETVL LNEIX3KQLLC EALYLYGVML 
LVIEQKIK^ VRERMLVSVY RYSAARSSAD SNKDDIOCLL RSTCysSQPG 
AKRPSNYPES YFQRVPINES FISMVIGRLR SDDIVNQVSA YPLPEHRSTA 
LAMQMMLYV ILVFEPSI1« IHQMOJREIV IHOTPENW71 SIVMSITVNL 
VmWEPYKAA KTKliamXiL SNVRBQASRY ATVSERVHAQ VQQFUCEGYL 
REEMVLCNIP KU2«XRDCN VHmnUSLXT ADSAC3>PNNK RLRQIKDQIL 
TDSRYNPRIL PQLLU3TAQF EFILKEMFKO MLSEKQTOWE HYKKBGSERM 
TEIADVFSGV KPLTIWEKNE NLQAWFREIS KQILSIIIVDD STAAGRKTVO 
LIQALEEVQE FHQLESNIW CCFIAETRKF LHQMIRTINl KEEVLITMOI 
VGDLSEAWQL IDSFTSIMQE SIRVNPSMVT KLRATFUOA SAUDLKLLRI 
NQANRPDLLS VSQYySGELV SYVRKWiQIl PESMFTSLLK IIKLQIHDII 
EVPTRLDfCDK WmUQUSPR VEVAKLIHM SIFTBSII«M RTTLVGIIKV 
DPKQLLEDGI RKELVKFWAP ALHRGLIFNP RAKPSELMPK LKELGAIMDG 
FHRSFEYIQD YVNTSfGLKIW QEEVSRIINY NVEQECNNFL RIKIQDWQSM 
YQSTOIPIPK FTPVDESVTF IGRLCREIUl ITDPKMTCHI DfflJUWyCMK 
THQEVTSSRI, FSEIQTTLGT FGUKSLDKhL CFMIVKELON FLSMFXJKllL 
RERTVQDTUC TUflOAVSPUC SIVANSNKIY FSAIARXOKI WTAYLEAIMK 
VGQMQILRQO lANELNiTSCR FDSKHIAAAL ENUOCALLAD lEAHYQDPSL 
PVPKEDMTLL YEITAYLEAA GIHNPI2JK1Y rmRLPYFP IVNFIFLIAD 
LPKKJYNKNL GMVCFKPTDP VEWPPLVU3L LTLLKOPHSR YTBQLIALIG 
PFI CSTVE QC TSQKIPEIPA DWGALLFLE DYVROTKLPR KWAEAHV5NF 
IFLtU'KIVL 
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Figure 13rA^ 



AG6 GGC QGA AOT CGG GOT CTG ACC CGC TCC AGG TCC GGG ACT GCG GAT 
AGA AGA GGA CCG CCG CCT TGA GGG AGG GGT GGA AAC TGG GTG CCG GCT 
CCG CGC GCG ACC TCC GGC CCT GCG CGT GCG CCG TGG CGC GGC CCG GCT 
GAC AGG TTC TTT AAT GGA GGA GCC AAT CTC TCT GCA CAC CTG GTT TCA 
TCT AAT AAT ATA CAG ACA CCA GCT CTG AGG CCA GTT AAT CAT CCC CAG 
TGT CCA GGC ACA GAG TAG TCG GTC CGC CTC ACA ATG TTG GAC TTT CTA 
GCC GAG AAC AAC CTC TGT GGC CAA GCA ATC CTA AGG ATT GTT TCC TGT 
GGT AAT GCC ATC ATT GCT GAA CTT TTG AGA CTC TCT GAG TTT ATT CCT 
GCT GTG TTC AGG TTA AAA GAC AGA GCT GAT CAA CAG AAA TAT GGA GAT 
ATC ATA TTT GAT TTC AGC TAT TTT AAG GGT CCA GAA TTA TGG GAA AGC 
AAA CTG GAT GCT AAG CCA GAG CTA CAG GAT TTA GAT GAA GAA TTT CGT 
GAA AAC AAC ATA GAA ATT GTG ACC AGA TTT TAT TTA GCA TTT CAA AGT 
GTA CAT AAA TAT ATT GTA GAC TTA AAC AGA TAT CTA GAT GAT CTC AAT 
GAA GGG GTT TAT ATT CAG CAA ACC TTA GAA ACT GTG CTT CTC AAT GAA 
GAT GGA AAA CAA CTT CTA TGT GAA GCA CTG TAC TTA TAT GGA GTT ATG 
CTA CTG GTC ATT GAC CAA AAG ATT GAA GGA GAA GTC AGA GAG AGG ATG 
CTG GTT TCT TAC TAC CGA TAC AGT GCT GCT CGA TCT TCT GCT GAT TCA 
AAT ATG GAC GAT ATT TGT AAG CTG CTT CGA AGT ACA GGT TAT TCT AGC 
CAA CCA GOT GCC AAA AGA CCA TCC AAC TAT CCC GAG AGC TAT TTC CAG 
AGA GTG CCT ATC AAC GAA TCC TTC ATC AGT ATG GTC ATT GGT CGA CTG 
AGA TCT GAT GAT ATT TAC AAC CAG GTC TCA GCG TAT CCT TTG CCG GAG 
CAT CGC AGC ACA GCC CTG GCA AAC CAA GCT GCC ATG CTG TAC GTG ATT 
CTC TAC TTT GAG CCT TCC ATC CTT CAC ACC CAT CAA GCA AAA ATG AGA 
GAG ATA GTG GAT AAA TAC TTT CCA GAT AAT TGG GTA ATT AGT ATT TAC 
ATG GGG ATC ACA GTT AAT CTA GTA GAT GCT TGG GAA CCT TAC AAA GCT 
GCA TkAA ACT GCT TTA AAT AAT ACC CTG GAC CTT TCA AAT GTC AGA GAA 
CAG GCA AGC AGA TAT GCT ACT GTC AGT GAA AGA GTG CAT GCT CAA GTG 
CAG CAA TTT CTA AAA GAA GGT TAT TTA AGG GAG GAG ATG GTT CTG GAC 
AAT ATC CCA AAG CTT CTG AAC TGC CTG AGA GAC TGC AAT GTT GCC ATC 
CGA TGG CTG ATG CTT CAT ACA GCA GAC TCA GCC TGT GAC CCA AAC AAC 
AAA CGC CTT CGT CAA ATC AAG GAC CAG ATT CTA ACA GAC TCT CGG TAC 
AAT CCC AGG ATC CTC TTC CAG CTG CTG TTA GAT ACT GCA CAA TTT GAG 
TTT ATA CTC AAA GAG ATG TTC AAG CAA ATG CTT TCA GAA AAG CAA ACC 
AAA TGG GAG CAT TAC AAG AAA GAG GGT TCG GAG CGG ATG ACT GAG CTT 
GCT GAT GTC TTT TCA GGA GTG AAA CCC CTA ACC AGA GTG GAG AAA AAT 
GAA AAC CTT CAA GCT TGG TTC AGA GAG ATC TCA AAA CAA ATA TTG TCT 
TTA AAT TAT GAT GAT TCT ACT GCT GCG GGC AGA AAA ACT GTA CAA CTG 
ATA CAA GCT TTG GAA GAG GTT CAA GAA TTC CAC CAG TTG GAA TCC AAT 
CTG CAA GTA TGT CAG TTT CTT GCC GAT ACT CGA AAG TTT CTT CAT CAA 
ATG ATC AGA ACC ATT AAC ATT AAA GAG GAG GTT CTG ATC ACA ATG CAG 
ATC GTT GGG GAC CTT TCT TTC GCT TGG CAG TTG ATT GAC AGT TTC ACA 
TCC ATC ATG CAA GAA AGC ATA AGG GTA AAT CCA TCC ATG GTT ACT AAA 
CTC AGA GCT ACC TTC CTA AAG CTT GCC TCT GCC CTC GAT CTG CCC CTT 
CTT CGT ATT AAT CAG GCA AAT CGC CCC GAC CTG CTC AGC GTG TCA CAG 
TAC TAT TCT GGA GAG TTG GTA TCC TAT GTG AGA AAA GTT TTG CAG ATC 
ATC CCA GAA AGC ATG TTT ACA TCT CTT CTA AAG ATC ATA AAG CTT CAG 
ACC CAC GAC ATT ATT GAA GTG CCT ACC CGC CTG GAC AAA GAC AAG CTG 
AGG GAC TAT GCT CAG CTA GGC CCA CGA TAC GAG GTT GCC AAG CTT ACT 
CAT GCT ATT TCC ATT TTT ACT GAA GGC ATC TTA ATG ATG AAA ACG ACT 
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Figure ^3(B) 
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Figure 14fB> 



X«eu lie Phe Asn Pro Arg Ala Lys 
Lys Glu Leu Gly Ala Thr Met Asp 
He Gin Asp Tyr Val Asn He Tyr 
Val Ser Arg He He Asn Tyr Asn 
Leu Arg Thr Lys He Gin Asp Trp 
He Pro He Pro Lys Phe Thr Pro 
Gly Arg Leu Cys Arg Glu He Leu 
Cys His He Asp Gin Leu Asn Thr 
Glu Val Thr Ser Ser Arg Leu Phe 
Thr Phe Gly Leu Asn Gly Leu Asp 
Lys Glu Leu Gin Asn Phe Leu Ser 
Asp Arg Thr Val Gin Asp Thr Leu 
Pro Leu Lys Ser He Val Ala Asn 
He Ala Lys Thr Gin Lys He Trp 
Lys Val Gly Gin Met Oln He Leu 
Asn Tyr Ser Cys Arg Phe Asp Ser 
Asn Leu Asn Lys Ala Leu Leu Ala 
Pro Ser Leu Pro Tyr Pro Lys Glu 
Thr Ala Tyr Leu Glu Ala Ala Gly 
Tyr He Thr Thr Lys Arg Leu Pro 
Phe Leu He Ala Gin Leu Pro Lys 
Met Val Cys Arg Lys Pro Thr Asp 
I-eu Gly Leu Leu Thr Leu Leu Lys 
Gin Leu Leu Ala Leu He Gly Gin 
Cys Thr Ser Gin Lys He Pro Glu 
Leu Leu Phe Leu Glu Asp Tyr Val 
Val Ala Glu Ala His Val Pro Asn 
Val Leu * Leu Phe Phe Leu Leu 
He Phe Pro Pro Ser Gin Met Asn 
Ala His Thr Thr Ala Phe Phe Leu 
Tyr Glu * Asp He Ser His Gly 
Leu Asn His Gly He Thr Cys Asn 
He Phe Val Leu Pro Leu Leu Asn 
He His Leu Val Leu Cys Ser Lys 
Phe Ser Lys He Val X^eu Leu 



Pro Ser Glu Leu Met Pro Lys Leu 
Gly Phe His Arg Ser Phe Glu Tyr 
Gly Leu Lys He Trp Gin Glu Glu 
Val Glu Gin Glu Cys Asn Asn Phe 
Gin Ser Met Tyr Gin Ser Thr His 
Val Asp Glu Ser Val Thr Phe He 
Arg He Thr Asp Pro Lys Met Thr 
Trp Tyr Aisp Met Lys Thr His Gin 
Ser Glu He Gin Thr Thr Leu Gly 
Arg Leu Leu Cys Phe Met He Val 
Met Phe Gin Lye He He Leu Arg 
Lys Thr Leu Met Asn Ala Val Ser 
Ser Asn Lys He Tyr Phe Ser Ala 
Thr Ala Tyr Leu Glu Ala He Met 
Arg Gin Gin He Ala Asn Qlu Leu 
Lys His Leu Ala Ala Ala Leu Glu 
Asp He Glu Ala His Tyr Gin Asp 
Asp Asn Thr Leu Leu Tyr Glu He 
He His Asn Pro Leu Asn Lys He 
Tyr Phe Pro He Val Asn Phe Leu 
Leu Gin Tyr Asn Lys Asn Leu Gly 
Pro Val Asp Trp Pro Pro Leu Val 
Oln Phe His Ser Arg Tyr Thr Glu 
Phe He Cys Ser Thr Val Glu Gin 
He Pro Ala Asp Val Val Gly Ala 
Arg Tyr Thr Lys Leu Pro Arg Arg 
Phe He Phe Asp Glu Phe Arg Thr 
Leu Gin Trp Lys Asp Cys Pro * 
Leu Lys Met Lys Arg Asn Ser Val 
Ser He Met Gly Asn He Arg Arg 
He Ser ♦ Tyr Asn ♦ Tyr Cys 
Leu Tyr Gin He Lys Ala Glu His 
Ala Glu Cys Asn Cys Tyr Val * 
Glu Leu Phe Val Gin Leu Gin He 



wo 97/38085 



PCTAJS97/05930 



Figure IS 



+ 


strand (sense) 


seouence 


(5 ■ - 


->3 ■ 


) 








1. 


pchl3-sp6-lf 


1st base 
370 


TTT 


ACT 


TCT 


AAC 


GCT 


TAT 


TC 


2. 


pchl3-sp6-2f 


726 


TGA 


AGG 


AGT 


CCT 


TTG 


AGA 


CG 


3. 


T7.1 


1140 


TCA 


CAA 


TGG 


GCT 


ACT 


GG 




4. 


T7.2 


1361 


TTC 


AAC 


GAG 


GGA GAT 


GG 




5. 


T7.3 


1602 


TTA 


GCA 


CCA 


CTG 


AGA 


GA 




6. 


T7.4 


2041 


GTT 


CTT 


TTA 


GGC 


ATT 


TA 




7. 


chl3-2480 2486 
strand (antlsense) 


GCT 


GCG 


TCT 


GTT 


CGT 


CAG 


C 


8. 


SP6.1 


2746 


CCT 


CTG 


CTT 


CAC 


AAC 


AT 




9. 


SP6.2 


2490 


CCA 


GtA 
(C) 
GTC 


GGG 


CGG 


ACA 


CC 




10 


. SP6.3 


2213 


AGG 


TTC 


TTC 


ATT 


6T 




11 


. SP6.4 


1812 


GGA 


TTG 


TCT 


TTG 


TCT 


CT 




12 


.pchl3-t7-lf 


1165 


ACT 


GCA 


CTT 


CCA 


TGG 


GCG 


TG 


13 


.pchl3-t7-lfa 


712 


CCT 


TCA 


TCA 


GGT 


TGA 


CGA 


AC 


14.pchl3-t7-2fa 


286 


GC6 


GCA 


ATC 


AGA AAC 


GGA AG 


15. 


.CH13-AS-1 


536 


TGA 


ACA 


CGT 


GGT 


ACA 


T 





wo 97/38085 



PCTAJS97/05930 



Figure 16(A^ 



1 CTICCCTGAG CCCTTTCTGC CTGTGTAGGA AGCAGAAGGC GGAATGTCGG 

51 CTCTGCCCTT CTCCGTAAGA TGGTGGATTA AAACX5TTCCT TATAAACTGG 

101 AAATGAAOGC TTGGGAAGAT GGSCTAAAATC AGCAATCXTTT GGAATAACGC 

151 AGAAGCATCC CTOCITOCCT GQGCXXXSCCC GTCGGCCTGC TTGT G CTOTT 

201 CAGTAGGTG6 TTTTTAGAAA GGGCTTCCTT CAGCGTCATT AGCAACAOGA 

251 GTCGTCGTCC GTTTCCATGA GGAAATGTTC TTAACCTTCC GTTTCTGATT 

301 GCCTCTAGAC TCCATCTGTC ATAGACAAAT GCCCCCATCT TTTACAGAGA 

351 ACCAGTCTCT TCTITAAACT TTACTTCTAA CGCTTATTCT TTTTACCTTA 

401 TATAGGAAAC CACTGATTGC TTGTGTGGAG AAACAGCTAT TAGGAGAACA 

451 TTTAACAGCA ATTCTGCAGA AAGGGCTCGA CCACTTACTG GATGAGAACA 

501 GAGTGOCGGA CCTCQCACAG ATGTACXAGC TGTTCAGCCG GGTGAGQGGC 

551 GGGCAGCAGG CGCTGCTQCA GCACTGGAGC GAGTACATCA AGACTTTIGG 

601 AACAGOGATC GTAATCAATC CTGAGAAAGA CAAAGACATG GTCCAAGACC 

651 TGTTGGACTT CAAGGACAAG GTG6ACCACG TGAICGAGGT CTGCTTCCAG 

701 AAGAATGAOC GGTTCGTCAA CCTGATGAAG GAGTCCTTTG AGACX3TTCAT 

751 CAACAAGA6A CCCAACAAGC CTGCAGAACT GATCGCAAAG CATGTGGATT 

801 CAAAGTTAAG AGCAGGCAAC AAAGAAGCCA CAGACGAQGA GCTGGAGCGG 

851 ACGTTGGACA AGATCATGAT CCTGTTCAGG TTTATCCACG GTAAAGATGT 

901 CTTTGAAGCA TTTTATAAAA AAGATTTGGC AAAAAGACTC CTTGTTGGGA 

951 AAAGTGCCTC AGTCGATGCT GAAAAGTCTA TGTTGTCAAA GCTCAAGCAT 

1001 GAGTGCX3GTG CAGCXTTTCAC CAGCAAGCTG GAAGGCATGT TCAAGGACAT 

1051 GGAGCTTTCG AAGGACATX:A TGGTTCATTT CAAGCAGCAT ATGCAGAATC 

1101 AGAGTOACTC AGGCXXTTAXA GACCTCACAG TGAACATACT CACAATGGGC 

1151 TAC TOGCCA A CATACAOGCC CATG GRAGTO CACTT AACCC CAGAAATGAT 

1201 TAAACTTCAG GAAGTATTTA AGGCATTTTA TCTTOGAAAG CACAGTGGTC 

1251 GAAAACTTCA GTGGCAAACT ACmGGGAC ATGCTGTrTT AAAAGCGGAG 

1301 TTTAAAGAAG GGAAGAAGGA ATTCCAGGTG TCCCTCTTCC AGACACTGGT 

1351 GCTCCTCATG TTCAACGAGG GAGATGGCTT CAGCTTTGAG GAGATAAAAA 

1401 TGGCCACGQG GATAGAGGAT AGTGAATTGC GCAGAACGCT GCAGTCCCTG 

1451 GCCTGTGGCA AAGCACGTGT GCTGATTAAA AGTCCCAAAG GAAAGGAAGT 

1501 GGAAGATGGA GACAAGTTCA TTTTTAATGG AGAGTTCAAG CACAAGTTGT 

1551 TTAGAATAAA GATCAATCAA ATTCAGATGA AGGAAACTST TGAGGAACAG 

1601 GTTAGCACCA CTGAGAGAGT GTTTCAGGAT AGACAATATC AGATTGATGC 

1651 TGCTATCGTC AGAATAATQA AGATGAGAAA GACTCTTGGT CATAATCTTC 

1701 TAGTTTCTGA ATTATATAAT CAGCTGAAAT TTCCAGTAAA GCCTGGAGAT 

1751 TTSAAAAAGA GAATTGAATC TCTGATAGAC AGAGACTATA TGGAGAGAGA 

1801 CAAAGACAAT C3CGAATCAGT ACCACTACGT GGOCTGACGC ATCTGCAGAC 

1851 GGrrcCCXriT CATGAAACAC TAGAATCTAC CXnCAGACSCA GGAAGCACAC 

1901 CTGTGCCATr TCTGGGACTC TGATTGATCC AGCTGTGGAC ATTQGAAGGC 

1951 GAAGGAAGGG AGGTGGCTCC TGGGTCATCT TTCACAAGGC TCAAGACTTC 

2001 AACXrroCAGA TGTATCTnT TCCCICCAGT TTTTCCTCTA GTTCTTTIAG 

2051 GCATTTAAAT TGmCTGTT ACTCTGTQCA AAATAACTTT GAGATIOGAC 

2101 AAGAAGATGT TACTMAGAG AAGTrcCTTT AAAAOGTCTT GTICVil»' ' il> " r 

2151 CAAAAAGCTG CAAGTTTOGT TI G TTCT C GT GTGTGATCAT GAGTGCACAA 

2201 TGAAGAAGAC CCTAGATGCT GCATnTTTA GCTCTGAAGA TTCCTTAGGT 

2251 ATCCCTGAAG ACAGCTCGCT CAGATGATCA GCATTTAGAG TX3AAAACAAG 

2301 GGCCCTTCAT G0G7GAACAT TAGAAAG^ CAGGGTTCA A AGCTOGCGAA 

2351 TOGAIGACXSC ACCCTAGCCA CTOGCXXX-TC OCTGTTTCAT GTATTTCCAA 

2401 AAGTTGTAAA CITIOGICGC TGATTTTTCG TAAGTCAGGT TTCTAAGTGA 

2451 GCTCCCIGAG GTGCCAAGGC CATGGTOTOC GCXXTTOCTGC gTC TG r i l Xarr 

2501 CAGCTGAGTr OCTTGTGAAT CI CTCTXTi'A GGG G TT GG GG C rA GTG'XXjrrr 
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Figure 16(B) 



2551 


TGTGTTTCCA 


2601 


GGGTAACTGC 


2651 


TAATAAAGTT 


2701 


TCTCTGCTGT 


2751 


CTGAAGCAGA 


2801 


TOGTiTTOTG 


2851 


TCAGTAGTGA 


2901 


CAirraAAAG 


2951 


AAAGCTACCA 


3001 


TTCTQGAATA 


3051 


GAGTCTCTTT 


3101 


CCATATTAAA 


3151 


TCATTTATGA 


3201 


TATmrrro 


3251 


ATTTM30TA 


3301 


TACA1TAATA 



TTCTAA GATT GAGTCTGGCA 
TCTTTCATTT TTTTTAATTG 
TCGTTTGGTT TTTACAGTCA 
AAAC TGTAA A AAGTTTATGG 

GGrTAnrro tcgaaagatt 

TTCTGTATAT ATACATGAGG 
TC TTAG AAGG GTAACTATGA 
TAlt^-x tATAT TTTACATAAT 
AAGGAATTTT GATCAIGGCA 
TACCAAGTTT ATATAATTTG 
TTGAAACATG CGGGTITCAA 
ATCCTCACIC TTTAAITOTC 
GTTCC ATGAT ATGTGGTCTA 
TCTTATAAC5T TCC3T1X3TCTC 
GA CTTACT TT GAATAAAATT 
AAACTTTGTG ATATGCAAAT 



GTCCCTGTTT TTTTGCATTC 
CAGTATTTCT GTGATTGCAA 
TCCGCAGGGA (XATCCTTOT 
AGACXTTAAAG TCTTGATCTT 
AAAAGGATTT TCTTGGTACX: 
TTGAACAGTC AAAGGAAAST 
CAAA QATAC T TTTGAGATAA 
AGCATCTTTC ATTITCATTA 
TAMTGTTTA AAGCAATATT 
ATTTTGTGCT AAATTATTAA 
AXAOXSACACC TTGTQGGTTT 
ATTTTTATCT TTGAAAATTT 
AGAAAGACCA AACAGATTTC 
TAGAG ATTCT TAATATTOEA 
AGTTTAATTG GCCTTAAAAT 
GACACATTC 
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Figure 17 



1 FPEPFLPV*E AEGGMSALPF SVRNCIKTFI« TtmH^BUSm AKISNFMNKA 

51 EASLLFWARP WACLCCSVG6 F*KGLPSASL ATGVWRLKE EMFLTFRF*L 

101 PLDCICHRQM PPSFTENQSL L-OIXLTLIL FTLYRKPLIA CVEKQLLGEH 

151 LTAILQKGLD HLLDENRVPD lAQM^QLFSR VRGGQQAIiLQ HWSEYIKTFXS 

201 TAIVINPEKD KDMVQDLLDF KDKVDHVIEV CFQKNERFVN IMKESFETFI 

251 KKRPNKPAEL XAKHVDSKLR AGNKEATDEE LERTLDEOMI LFRFIHSKDV 

301 FEAFYKKDIA KRLLVGKSAS VDAEfCSMLSK LKHECGAAFT SKLEGMFKCM 

351 ELSKDIMVHF KQHMQWQSDS GPIDLTVNIL OMGYWPTVTP MEVHLTPEMI 

401 KLQEVEKAPy LGKH SGRKLQ WQTTLCSHAVL KAEFICBGKKE FQVSLFQTLV 

451 LUff NEGOG F SFEE IKMATC lEDSELRRTL QSIACGKAPV LIKSPKGKEV 

501 mGDKFIFNG EFKHKLFRIK INQI(»iKElV EEQVSTTERV FQDRC2YQIEA 

551 AIVRIMKMRK TLGHNLLVSE LYNQLKFPVK PGDLKKRIES LIDRDYMERD 

601 KDNPNQYHYV A»RICRRFPF MKH*NVPSEQ EAKLCHFWDS D-SSCGHWKA 

651 KEGRWLLGHL SQGSRLQPAD VSFSLQFFL* FF-AFKLFLL LCAK»L«DWT 

701 RRCY-REVPL KGLVLVSKSC KFXa^SCVIM SAQ*RRP»ML HFLALKIP-V 

751 SLKTARSDDQ HLE«KQGPFM GEH-KEPGFK AGEWMTHPSH WPLPVSCISK 

801 SCKIM^LIFR KSGF*VSSLR CQGHGVRPAA SVRQLS5L*! SVLGVGASVF 

851 VFPF*D*VWQ SLFFCIGVTA L^FFLIAVFV •LQ»*SLVWF LQSCAGTILV 

901 LCCKL^KVYG DLKS^CCEAE VIUNKD*KDF VGTWFCWyi ^YMRLNSERBV 

951 Q*»C«KGtiyD RUfi'FKlTKKS TLYFT»»HVS F*LKATR5IL IMA^VFKAIF 

1001 SGIYQVYII» FCAKLLRVSF •NMRV^NTfTP CGFPY^NPHS LIVIFIFENF 

1051 HL*VP«YW» ERPNRFLFFF LISSLCLEIV NIVI»C3U.TL NKISLIGLKI 

1101 TLUCLCDMOM TH 



KiQUFSR VRGGQQALLQ mSEnKTFO 

201 TAlVZNPEiCD KCKVQDLLDF KCKVCHVIESr CFQKNERFVN UIKESFETFI 

251 NKRPNKPAEL lAKHVDSKLR AGNKEATDEE LERTLEHCIMI LFRFIHGKDV 

301 FEAFYKKXZLA KRLLVGKSAS VXAEKSMLSK LKHECGAAFT SKLEGMFKCM 

351 ELSKDIMVHF KOHMQNQSDS GPIDLTVNIL TMGYWPTYTP MEVHLTPEMI 

401 KLQE VFKAF Y LGKH SGRKLQ WQTHGHAVL KAEETCEGKKE FQVSLFQTLV 

451 LL MFNE GDGF SFEEIKMATG lEDSELRRTL QSLACGKAKV LIKSPKGKEV 

501 EDGDKFmJS EFKHKLFRIK IHQIQMKETV EEQVSTTERV FQDRQYQIDA 

551 AIVRIMKMRK TLGHNLLVSE LVNQLKFFVK PGDLKKRIES LIDRDYMERD 

601 KZXlPNQyHyV A 
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Figure 19 



1 GAAGATGATG ATTACGGGfTC TCGAACAGGA AGCATCTCCA GCAG T GTGTC 

51 TGTGCCTGCA AAGCCTCAAA GGAGACCTTC TCTTCCACCT TCTAAACAAG 

101 CTAACAAGAA TCTGATTTTG AAGGCTATAT CTGAAGCTCA AGAATCCGTA 

151 ACAAAAACAA CTAACTACTC TACAGTTOCA CAGAAACAGA CACTTCCAGT 

201 TCCTCCCAGA ACTCGAACTT CTCAAGAAGA ATTCCTAGCA GAAGTCGTCC 

251 AGGGACAAAG TAGGACXXXT AGAATAAGTC CXTCCATTAA AGAAGAGGAA 

301 ACAAAAGQAG ATTCTGTAGA AAAAAATCAA GCTGAGATGA GTCAAdXSAG 

351 TGTGGCACAG AAACCAGAAA AACTTTTGGA GCGCTGCAAG TACTCGCCTC 

401 CTTGTAAAAA TGGGGATGAG TGTGCCTACX: ATCACCCCAT CTCACCCTCC 

451 AAA GCCTTC C CXSUVTTGTAA ATTTGCTGAA AAATOrTTGr TrGTTCACCC 

501 AAATTCTAAA TATGATGCAA AGTGTACTAA ACCAGATIGT CXXnTCACTC 

551 ATGTGAGTAG AAGAATTCCA GTACTGTCTC CAAAACCAGT TGCACCACCA' 

€01 GC AOC A OCOT CX AGTA GTCA QCTCTGCOGT TACTTCCCTG CTTGTAAGAA 

651 GATCGAATGT CCCTTCTATC ATCCAAAACA TTGTAGGTTT AACACTCAAT 

701 GTACAAGTCC GGACTGCACA TTCTACCATC CCACCATTAA TGTCCXrACCA 

751 CGACATOCCT TOAAATGGAT OXrCACXTTCAA ACCAGCGAAT AGCACCCAGT 

801 CCTGCCTGGC AGAAGATCAT GCAGTTTCGA AGTTTTCATG TACTGATGAA 

851 AGATACTCTA CA GAACTT GT CAAATCTTTG AAACTrGGAA TATATIGCTT 

901 T CATAA TATG AAGTTTTATT GCCTATCTAT CTGAAGTGTC TAATTTTTCA 

951 AGTITGTAAG TTTATTATGT GGTTTTAACA TTGGGTCTTT TT GlT l' i XjTr 

1001 TTTACTATGA AAAGACAGCT TAAGGAAGAG CTAAATTCTG TTAAAATATT 

1051 TCGGG CATCT TTGTGCACTG CTGTTGIGAG GATCAGCATA TGAAATTGAC 

1101 ATCATGGTTA GTCATGGTAC TGCAGCTTAG GGGGCTACAC GOTTGCTOTG 

1151 TGAGTC3GAGA GATGCAGTGA GGCAGTTGTC ATTATTCTAA AAATTGTACT 

1201 ACrrrCACTT TTCCCAAAGA TTATATAATG rrCATAATCC ACCATCAAAA 

1251 CAGCATTGGC CAAAGGTACT GAGGCTGCTT AAAATATTCA ATTCTGCTTT 

1301 TTAGTTrrTA AGTGAMTIA GTTTGAAAAG CATGATTATA GAGGCCTCTC 

1351 GAGGCTGAGT GCTACTTTCG GTAAAGTTCC AGmTCCAG CCTTCTOTCA 

1401 CAGGATGAAT GAGGTGGGTA TGGACAGTGG AGGCAGCTGG AATGGCAAGT 

1451 GCAGAAAATA GGRAC AGTTC TATACAGTGC TCTCATTTAC TAATAACATA 

1501 AT QCCTTCTA AATAATTTTT TTQGQAAACT ACATTATCAC AAAATTATAC 

1551 AAATTTTTTT ACAAGTATTT ACATACTGTA TCTGAAAACA GACTTTAAAG 

1601 TCACAA GATT ATAAATGTAC ATATGTATXC OXaCATTCTG AAAAATAACA 

1651 TTC TCAGA AT CCACAGAAAA TATACTTAGT TACTACTGRA GATAA TXTIT 

1701 GAAATCTAAA AATTAGATTT AAATAGTATA TTTTAAATGA CAGAACTATA 

1751 ATEACA GAGA TCAGATCAGA TAGGTAAACT GCAAGATAGA TAGGATCAAA 

1801 Cr mijGC CT ACTGTATEAC TTACAGAGTT TTTTTGTGTG TGGrrTTTAA 

1851 AACTGTTAAG GCAAGAAGTG TCAAATGCTT TAGAGTTAAA TAACAGATCA 

1901 CTGATTICAA AGACTTGGTG TATAGTGTTA AAAATTAAAG CTTAAAAGGT 

1^51 GGTEAGAAAA GTOGATEAAT GCAAAAGGGG TAATAAAGAC OGCAACATTC 

2001 TCAGGACCAA ATXAAACTGC T 
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Figure 20 



1 EDODVGSRTG SISSSVSVPA KPERRPSLPP SKQANKNLIL KAISEAQESV 

51 TKTTOYSTVP QKQTLPVAPR TRTSQEELLA EWQGQSRTP RISPPIKEEE 

101 TKGDSVEKNQ AEMSELSVAQ KPEKLLERCK YWPACKNGDE CAYHHPISPC 

151 KAFPNCKFAE KCLFVHPNCK YDftKCTKPDC PFTHVSRRIP VLSPKPVAPP 

201 APPSSSQLCR YFPACKKMEC PFYHPKHCRF NTQCTSPDCT FYHPTINVPP 

251 RHALKWIRPQ TSE*HPVLPG RRSCSLEVFM Y^^KTLYBTC QIFETWNILL 

301 S*VEVLLPiy LKCLIFQVCK FIMWF*HWVF LFCFYYEKTA •GRAKFC-NI 

351 WGMFVBTO CE DQHMKLTSWL VMVLQLRGLH GCCVSGQIQ^ GSCHYSKNCT 

401 TFTFPKCYIM FIIHHENSIG QBY-GCLKYS ILXiFSF*VNL V»KA»LYRPI, 

451 EAECYFR^SS SFPAFCESMJ EVGMDSGGSW NGKCRK»BQF YTVLSFINNI 

501 MPSK^FFWET TLSQNYTNFF TSIYILYLKT DKECVTRL^MY ICILTF^KIT 

551 FSESTENILS YY^R^FLKCK N.I*IVYFK* QNYNVRDQIR ♦VNCKIDRMK 

601 LLAYCITYRV FLCWFKTVK ARSVKCFRVK •QITDFKCa.V YSVKN-SLKG 

651 G^KSGLMQKG ••RLQHSQDQ IKL 



EDDDYGSRTG SISSSVSVPA KPERRPSLPP SKQANKNLIL KAISEAQESV 
TRTTOYSTVP QKQTLPVAPR TRTSQEELLA EWQGQSRTP RISPPIKEEE 
TKGDSVEKNQ AQSSELSVAQ KPEKULER CK YWPACKNGDE CAYH HPISPC 
KA FPNCKFAE KCLFVHP N CK YTftKCTKPDC PFTH VSRRIP VLSPKPVAPP 
APPSSSQL CR YFPACKKMEC PFYH PKH CRF NTQCTSPDCT FYHP TINVPP 
RRAUCWIRPQ TSE * 
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Figure 21 



1 AAAACTTTCG GAAGAGAAAG TTGCCTCTCG TAAGTTCAGT TGTTAAAGTA 

51 AAAAAATTCA ATOITGATCG AOMGAGGAG GAAGAAGATG ATGATTACX3G 

101 GTCTCGAACA GGAAGCATCT CCAGCAGTGT GTCTGTGCCT GCAAAGCXTTG 

151 AAAGGAGACC TTCTCTTCCA CCTTCTAAAC AAGCTAACAA GAATCTGATT 

201 TTGAAGGCTA TATCTGAAGC TCAAGAATCC GTAACAAAAA CAACTAACTA 

251 CTCTACAGTT CCACAGAAAC AGACACTTCC hGTTGCTCCC AGAACTCGAA 

301 CTTCTCAAGA AGAATTGCTA GCAGAA3TGG TCCAGQGGAC AAAGTAGGAC 

351 CCCCAGAATA AQTCCCCCCA TTAAAGAAGA GGAAACAAAA GGAGATTCTG 

401 TAGAAAAAAA TCAAGATTAC TATGACATGG AATCCATGGT CCATGCAGAC 

451 ACAAGATCAT TTATTCTQAA GAAGCCAAAG CTOTCTGAGG AAGTANTAGT 

501 GGCACCAAAC CAAGAMTCGG GGATGAAGAC TGCAGATTCC dTCGGGTTC 

551 TTTCAGGGAC CCTTATGCAG ACACNAGATC TTGTrCAACC AGATAAACCT 

601 GCAAGTCCCA AG 



1 ktfgresclw •vqllk^kns immekrrkkm mxtolbqeas pavclclqsl 

51 kgdllfhlu^ kltri«f»rl ylklknp-qk qlttlqfhrn rhfqixpele 

101 lucknc«qkw srgqsrma sppikeeetec gdsvekeqdy yzsiesmvhad 

151 TRSFUKKPK lseevxvapn qxsgmrtads uwlsgtlmq txdlvqpdkp 

201 ASFK 



1 NAGCTGCTCT GACXXXSNAGN GGAATGNATG GM3GCTTGTT CNGAAACNJ^ 

51 CXAGATGGCG NGAGGGGGAC AAGTAGCGGC GTGATONAGA AGAGGGAGGT 

101 GAGGGTOCTC ACATCACX3>IC ATCTOACCAT GNCGNGCCNT CCXXANTANT 

151 AANANTGATG ATAGNGGGAA GTGGGCXXAC CCAGAAGCNT GATTGAGCX^G 

201 CCGCCAGTAN GAAAGNNGTT TGTtXANTTA GNCATACNKA 139GTAGGGTT 

251 CNAGO roOGT CCCOGGCAOC NQCANANNNM CNNCNGGGAC NAO^SCCCNN 

301 MNNTNNGTTA NNCNSNGNAG NNAAAAAATT CAATCATGAT GGAGAAGAGG 

351 AGGAA GAAGA TCATGATTAC GGGTCTCGAA CAGGAAGCAT CTCCAGCAGT 

401 GTCTCTGTGC CTGCAAA 



Untitled translated in RF 2 

1 SCSDG3CXNXW XLVXKXARHK EGDK*RREKE EGGESGXHITX SXKXXXSPXX 

51 XXMXXGSGPT QKX D^AAASX KXVCFXXHXX XRVXXASPAX AXXXXGXXPX 

101 XXUCXXXKKF NHIXyEtiilKD DPyGSRTGSI SSSV5VPA 
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Figure 22 

Cm-9aff-2 

GA AAA CAA ATG GAA GAA ATG CAA AAG GCT TTC AAT AAA ACA ATC GTG 
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CH14-2a16-1 

TG TTT GTT CAC CCA AAT TGT AAA TAT GAT GCA AAG TGT ACT AAA CCA 
GAT TGT CCC TTC ACT CAT GTG AGT AGA AGA ATT CCA GTA CTG TCT CCA 
AAA CCA GTT GCA CCA CCA G 

Phe Val His Pro Asn Cys Lys Tyr Asp Ala Lys Cys Thr Lys Pro Asp 
Cys Pro Phe Thr His Val Ser Arg Arg He Pro Val Leu Ser Pro Lys 
Pro Val Ala Pro Pro 
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Figure 23(A) 



CTCAGAGAGG GCTGCCAGGA CGCGAGCCAC TGAGGAGCCG CTCAGCCAGC 
GCCATAGCCC TTAGGACTAT CGGTCACATT CTCGCGCTCC TGCTCCGGCT 
CCTCCATCTT GGCCTCGGCA GTGGCGGCTG CCGGGAGGAT GTGCCGCCTT 
CTGGCAGGGG GAAGAAGGAG GAGAAGATGA AGAAGCACCG GCGGGCCTTG 
GCCCTGGTCT CCTGCCTCTT TCTGTGCTCT CTGGTCTGGC TTCCCAGCTG 
GCGTGTATGT TGTAAAGAGA GTTCCTCAGC TTCAGCGTCA TCATATTACT 
CTCAAGATGA CAACTGCX5CA CTAGAAAATG AAGATGTACA ATTCCAGAAA 
AAGAATACAG AGTCAAAAAA GTTAAGTCCA CCGGTGGTGG AGACACTCCC 
TACAGTTGAT TTGCATGAAG AGTCTTCCAA TGCAGTTGTG GACAGTGAAA 
CTGTTGAAAA TATTTCCAGC TCATCTACCT CAGAAATCAC TCCAATCTCA 
AAGCTTGATG AAATAGA7VAA ATCTGGTACT ATTCCGATAG CCAAACCAAG 
TGAAACTGAG CAGTCTGAAA CTGATTGTGA TGTTGGTGAG GCCCTTGATG 
CTAGTGCTCC AATTGAACAA CCTTCCTTTG TCAGTCCACC TGACAGCCTT 
GTTGGCCAGC ATATAGAAAA TGTATCATCT TCACATGGTA AAGGAAAGAT 
AACAAAATCA GAATTTGAAT CAAAAGTTTC AGCAAGTGAA CAGGGCGGTG 
GTGATCCAAA ATCTGCATTG AAT6CTTCAG ATAATTTAAA AAATGAGAGC 
TCTGATTATA CAAAACCAGG AGACATTGAC CCTACATCAG TAGCAAGTCC 
CAAAGATCCA GAAGATATAC CAACATTTGA TGAATGGAAG AAGAAAGTTA 
TGGAAGTAGA AAAAGAAAAA AGTCAGTCGA T6CATGCATC TTCTAATGGA 
GGTTCACATG CCACCAAAAA GGTCCAGAAA AATCGAAATA ATTATGCCTC 
AGTAGAATGT GGTGCCAAAA TTCTAGCAGC TAATCCAGAA GCCAAGAGCA 
CATCTGCTAT TCTTATAGAA AATATGGATC TTTACATGTT GAATCCTTGC 
AGCACTAAAA TTTGGTTTGT TATTGAACTT TGTGAACCAA TTCAAGTAAA 
ACAGCTTGAT ATTGCAAATT ATGAATTATT TTCTTCTACT CCTAAAGATT 
TTCTGGTTTC TATCAGTGAC AGATATCCAA CAAATAAGTG GATTAAGCTG 
GGTACTTTTC ATGGTAGAGA TGAGCGGAAT GTACAGAGTT TCCCTTTAGA 
TGAACAGATG TATGCAAAAT ATGTCAAGGT TGAGTTGCTA TCACATTTTG 
GATCAGAGCA CTTTT6TCCA TTAAGCCTTA TAAGGGTATT TGGCACTAAC 
ATGGTGGAAG AATATGAAGA AATTGCTGAT TCCCAGTATC ACTCAGAACG 
CCAGGAACTA TTTGATGAGG ACTATGATTA TCCACTGGAT TATAATACTG 
GAGAGGATAA ATCCTCAAAA AATCTTCTTG GTTCTGCTAC AAATGCCATT 
CTAAATATGG TGAATATTGC TGCTAATATT CTGGGAGCAA AAACTGAAGA 
CCTGACAGAA GGAAATAAAA GTATATCTGA GAATGCCACT GCCACAGCTG 
CACCTAAAAT GCCTGAATCA ACTCCTGTTT CAACTCCTGT TCCATCTCCT 
GAGTATGTAA CCACTGAAGT ACACACACAT GACATGGAGC CGTCAACACC 
AGATACTCCA AAAGAGAGTC CCATTGTACA GTTAGTTCAA GAGGAGGAAG 
AGGAGGCAAG TCCATCTACA GTGACCCTTC TGGGCAGCGG TGAACAGGAA 
6ATGAATCAT CACCCTGGTT TGAGTCAGAG ACACAAATAT TTTGCAGTGA 
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Figure 23(B) 



ACTGACCACA ATTTGTTGTA TTTCTAGTTT TTCAGAATAC ATATATAAAT 
GGTGTTCAGT TAGAGTTGCT CTTTATCGGC AGCXSCAGCCG AACTGCTTTG 
AGTAAAGGAA AAGATTATCT TGTGTTAGCT CAACCACCCT TACTACTTCC 
TGCGGAATCA GTAGATGTTT CAGTATTGCA ACCTCTGAGT GGAGAATTGG 
AAAATACGAA TATAGAAAGG GAAGCTGAAA CTGTTGTTCT GGGTGATTTA 
AGTAGTAGTA TGCACCAGGA TGACTTGGTG AATCACACTG TAGATGCAGT 
TGAACTTGAA CCAAGCCATT CTCAAACTCT TTCTCAGTCT CTTCTTTTAG 
ATATTACCCC AGAAATCAAT CCCTTGCCTA AAATAGAAGT ATCTGAGTCT 
GTTGAATATG AGGCAGGACA TATACCATCA CCAGTGATTC CCCAAGAGAG 
TTCTGTTGAG ATCGATAATG AAACAGAACA AAAGTCTGAG AGCTTTAGTT 
CTATAGAGAA ACCATCTATT ACCTATGAAA CAAATAAAGT TAATGAGTTA 
ATGGATAATA TTATAAAAGA AGATATGAAC TCCTITGCAAA TTTTCACAAA 
GCTGTCTGAA ACAATAGTGC CACCAATAAA TACAGCCACT GTACCCGACA 
ATGAAGATGG GGAAGCCAAA ATGAATATAG CTGACACAGC AAAGCAAACT 
TTGATTTCTG TTGTGGATTC TTCTTCATTA CCTGAAGTAA AAGAAGAAGA 
ACAGTCTCCA GAAGATGCCC TTTTGAGAGG GTTACAGAGG ACAGCTACAG 
ATTTTTATGC TGAATTGCAA AATTCTACAG ATCTAGGATA TGCTAATGGA 
AATCTTGTAC ATGGATCAAA CCAAAAGGAG TCAGTATTTA TGAGACTTAA 
TAATCGTATT AAAGCCTTAG AAGTTAACAT GTCTCTCAGT GGTCGCTATC 
TGGAGGAGCT TAGCCAAAGG TACCGAAAAC AAATGGAAGA AATGCAAAAG 
GCTTTCAACA AAACAATCGT GAAACTTCAG AATACTTCAA GAATAGCAGA 
GGAGCAGGAT CAGCGGCAAA CTGAAGCCAT CCAGTTGCTA CAGGCACAGC 
TGACCAACAT GACACAGCTT GTTTCAAATT TATCAGCAAC AGTAGCAGAA 
TTGAAACGGG AGGTTTCAGA TCGACAAAGC TATCTTGTCA TATCTTTGGT 
TCTTTGTGTT GTCTTGGGAC TGATGCTTTG TATGCAGCGT TGTCGAAATA 
CTTCTCAATT TGATGGAGAT TATATTTCAA AACTTCCTAA AAGTAATCAG 
TATCCAAGCC CTAAAAGGTG TTTCTCTTCC TATGATGATA TGAATTTGAA 
AAGAAGAACT TCATTCCCAC TCATGAGATC CAAGTCTCTA CAGTTAACTG 
GCAAAGAAGT AGACCCAAAT GATTTGTACA TTGTAGAACC CCTCAAGTTT 
TCTCCAGAAA AGAAGAAGAA GCGCTGCAAG TACAAAATTG AAAAAATTGA 
GACCATAAAG CCTGAAGAAC CATTGCACCC CATAGCCAAT GGCGACATAA 
AAGGAAGAAA 6CCCTTTACG AACCAGAGAG ATTTTTCTAA TATGGGAGAA 
GTTTATCACT CTTCTTATAA AGGTCCTCCA TCTGAAGGAA GCTCAGAAAC 
TTCATCACAG TCAGAAGAGT CCTATTTTTG TGGCATTTCA GCTTGCACAA 
GTCTGTGCAA TGGAGAGTCT CAAAAGACAA AAACTGAGAA GAGGGCTTTA 
AAACGAAGAC GATCTAAAGT CCAAGACCAA GGAAAATTGA TAAAAACTCT 
AATACAGACT AAGTCGGGAT CATTGCCGAG CCTGCATGAC ATAATCAAAG 
GAAACAAA6A GATCACCGTG GGAACATTTG GTGTTACAGC AGTCTCGGGA 
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Figure 23(C) 



CATATCTAAA ATTAATTGAA CTTTTCATAC AGAAGACTTT TTTGTTGTTG 
TTCTTTGAAG AACAGTCTGT AGTATTTGAA GGGTTTGGGG GAGGGAGAAA 
ATATTAATGG GAAAGGCATT CAGAAATTAT GGTTTCTACC TTTTTAAAAA 
GTAGATGGGA TTGTGCTCAA TCTTGGTTAA TGAGCTACAG TTTTACAAAG 
CTGATCACTT CCTATAAGGA CAATGGTAGA CATTTTATAA A6ATGTTTTT 
TCACAAGATT AATTACTGGG ACAAAAGTAA TTTGGAAGCC CAGTTCCTTA 
GGTGGGATAG GAATGAAAGC CTAAACCTCT TCCTTTAGCT TTGTTCCTAT 
TTCTTGCACC TTCCCATATT TATGTGCCTT TTGTCTATTT ATAATGCCAC 
TGGAAGAGGA GGGATAACTT TTTCTGTTAT TTGATTTCTT TTATAACTTT 
GTTAGGTTTT TGAAGCTGCA AACACTACAA TGCTTTGAGG GGGTCTGTGC 
CTGAAGCTCA GGAGTGTGGA TCAGACAGTC TAAAGATCCT AAAAACTTGC 
CAACTGGATC TTTGTTTAGC AAACTCACTG GAAATGAACA CTTAATGGAA 
TTTTTAAGTC TGTTCTGTTA GGTAGATGGT GATGCTCTTG TTATTTTCAC 
TTATTCAGGC TGGATTACTT CTTACTTAGT TACTAACTCA ATGAGGAAAA 
AATCCCTACA GGATCTTTTT TTGCAAACAA CTGATATATG CAGACAAATT 
TTTGACAAAT TCACCTTTTA AACACGACGT TAACCX3ATTT GTGAAGGTTT 
TCTTTAGCTT ACATTTTAAA CATACACAAT AAACACTAAT CCTCCAAACT 
TTCACTGTTT TTATTAGTAT GAATATAAfiA TTTGAAGGTT TGGCCAATTA 
GTACAAGTCT CATGATATAA TCACAGCCTG CATACATATG CACAGATCCA 
GTTAGTGAGT TTGTCAAGCT TAATCTAATT GGTTAAGTCT AAAGAGATTA 
TTATTCCTTG ATGTTTGCTT TGTATTGGCT ACAAATGTGC AGAGGTAATA 
CATATGTGAT GTCGATGTCT CTGTCTTrrr TTTTGTCTTT AAAAAATAAT 
TGGCAGCAAC TGTATTTGAA TAAAATGATT TCTTAGTATG ATTGTACAGT 
AATGAATGAA AGTGGAACAT GTTTCTTTTT GAAAGGGAGA GAATTGACCA 
TTTATTGTTG TGATGTTTAA GTTATAACTT ATTGAGCACT TTTAGTAGTG 
ATAACTGTTT TTAAACTTGC CTAATACCTT TCTTGGGTAT TGTTT6TAAT 
GTGACTTATT TAACGCCTTC TTTGTTTGTT TAAGTTGCTG CTTTAGGTTA 
ACAGCGTGTT TTAGAAGATT TAAATTTCTT TCCTGTCTGC ACAATTAGCT 
ATTCAGAGCA AGAGGGCCTG ATTTTATAGA AGCCCCTTGA AAAGAGGTCC 
AGATGAGAGC AGAGATACAG TGAGAAATTA TGTGATCTGT GTGTTGTGGG 
AAGAGAATTT TCAATATGTA ACTACGGAGC TGTAGTGCCA TTAGAAACTG 
TGAATTTCCA AATAAATCTG AACACTTGTC TTTATT 
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Figure 24 



QRGIiPGREPL RSRSASAIAL RTIGHILALL LRLLHLGLGS GGCREDVPPS 
GRGKKEEKMK KHRRALALVS CLFLCSLVWL PSWRVCCKES SSASASSYYS 
QDDNCALENE DVQFQKKNTE SKKLSPPWE TLPTVDLHEE SSNAWDSET 
VENISSSSTS EITPISKLDE lEKSGTIPIA KPSETEQSET DCDVCEALDA 
SAPIEQPSFV SPPDSLVGQH lENVSSSHGK GKITKSEFES KVSASEQGG6 
DPKSALNASD NLKNESSDYT KPGDIDPTSV ASPKDPEDIP TFDEWKKKVM 
EVEKEKSQSM HASSNGGSHA TKKVQKNRNN YASVECGAKI LAANPEAKST 
SAIIilENMDL YMIiNPCSTKI WPVIELCEPI QVKQLDIANY ELFSSTPKDF 
LVSISDRYPT NKWIKLGTFH GRDERNVQSF PLDEQMYAKY VKVELLSHFG 
SEHFCPLSLI RVFGTNMVEE YEEIADSQYH SERQELFDED YDYPI*DYNTG 
EDKSSKNLIiG SATNAILNMV NIAANILGAK TEDLTSGNKS ISENATATAA 
PKMPESTPVS TPVPSPEYVT TEVHTHDMEP STPDTPKESP IVQLVQEEEE 
EASPSTVTIjIj GSGEQEDESS PWFESETQIF CSELTTICCI SSFSEYIYKW 
CSVRVALYRQ RSRTALSKGK DYLVLAQPPL LLPAESVDVS VLQPLSGELE 
NTNIEREAET WLGDLSSSM HQDDLVNHTV DAVELEPSHS QTLSQSLLLD 
ITPEINPLPK lEVSESVEYE AGHIPSPVIP QESSVEIDNE TEQKSESFSS 
lEKPSITYBT NKVNEliMDNI IKEWmSMQI FTKLSETIVP PINTATVPDN 
BDGEAKMNIA DTAKQTLISV VDSSSLPEVK EEEQSPEDAL LRGLQRTATD 
FYAELQNSTD LGYANGNLVH GSNQKESVFM RLNNRIKALE VNMSLSGRYI. 
EELSQRYRKQ MEEMQKAFNK TIVKLQNTSR lAEEQDQRQT EAIQIiLQAQL 
TNMTQLVSNIi SATVAELKRE VSDRQSYLVI SLVLCWLGL MLCMQRCRNT 
SQFDGDYISK L.PKSNQYPSP KRCFSSYDDM NLKRRTSFPL MRSKSLQLTG 
KEVDPNDLYI VEPLKFSPEK KICKRCKYKIE KIETIKPEEP LHPIANGDIK 
GRKPFTNQRD FSNMGEVYHS SYKGPPSEGS SETSSQSEES YFCGISACTS 
LCNGQSQKTK TEKRALKRRR SKVQDQGKLI KTLIQTKSGS liPSIiHDIIKG 
NKEITVGTFG VTAVSGHI-N •LNFSYRRLF CCCSLKNSIi* YLKGLGEGEN 
INGKGIQKLW FLPF^KVDGI VLNLG^^ATV LQS*SLPIRT MVDIL^RCFF 
TRIjITGTKVI WKPSSLGGIG MKA^TSSFSF VPISCTFPYL CAFCLFIMPL 
EEEG-LFLIiF DFFYNFVRFL KLQTLQCFEG VCA»SSGVWI RQSKDPKNLP 
TGSLFSKLTG NEHLMEFLSL FC^VDGDALV iftysgwits ylvtnsmrkk 
SLQDLFLQTT DICRQIFDKF TF»TRR«PIC EGFIi*LTF*T YTINTNPPNF 
HCFY#YBYKI •RFGQLVQVS •YNHSIiHTYA QIQLVSLSSL I-LVKSKEII 
IP^CIiLCIGY KCAEVIHM^C RCIiCLFFCIi* KIIGSNCI^I K^FLSMIVQ* 
•MKVEHVSF* KGEN^PFIW MFKL*LIEHF ••♦•LFLNLP NTFLGYCIi^C 
DLFNAFFVCL SCCFRLTACF RRFKFLSCLH N«LFRAR6PD FIEAP»KEVQ 
MRABIQ»EIM •SVCCGKRIF NM«I«RSCSAI RNCEFPNKSE HLSL 
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Figure 25(A) 



TAGAATTCAG CGGCCGCTGA ATTCTAGCTG CGGGGTAGGA GTCCGCGGCA 
GCCTCC!GGGT AAGCCAAGCG CCGCGCAGTG CTGAGTTCCC GCACGCCGCA 
GAGCCATGGA GATCGGCACC GAGACCAGCC GCAAGATCCG GAGTGCCATT 
AAGGGGAAAT TACAAGAATT AGGAGCTTAT GTTGATGAAG AACTTCCTGA 
TTACATTATG GTGATGGTGG CCAACAAGAA AAGTCAGGAC CAAATGACAG 
AGGATCTGTC CCTGTTTCTA GGGAACAACA CAATTCGATT CACCGTATGG 
CTTCATGGTG TATTAGATAA ACTTCGCTCT GTTACAACTG AACCCTCTAG 
TCTGAAGTCT TCTGATACCA ACATCTTTGA TAGTAACGTG CCTTCAAACA 
AGAACAATTT CAGTCGGGGA GATGAGAGGA GGCATGAAGC TGCAGTGCCA 
CCACTTGCCA TTCCTAGCGC GAGACXTTGAA AAAAGAGATT CCAGAGTTTC 
TACAAGTTCG CAGGAGTCAA AAACCACAAA TGTCAGACAG ACTTACGATG 
ATGGAGCTGC AACCCGACTA ATGTCAACAG TGAAACCTTT GAGGGAGCCA 
GCACCCTCTG AAGATGTGAT TGATATTAAG CCAGAACCAG ATGATCTCAT 
TGACGAAGAC CTCAACTTTG TGCAGGAGAA TCCCTTATCT CAGAAAGAAC 
CTACAGTGAC ACTTACATAT GGTTCTTCTC GCCCTTCTAT TGAAATTTAT 
CGACCACCTG CAAGTAGAAA TGCAGATAGT GGTGTTCATT TAAACAGGTT 
GCAATTTCAA CAGCAGCAGA ATAGTATTCA TGCTGCCAAG CAGCTTGATA 
TGCAGAGTAG TTGGGTATAT GAAACAGGAC GTTTGTGTGA ACCAGAGGTG 
CTTAACAGCT TAGAAGAAAC GTATAGTCCG TTCTTTAGAA ACAACTCX3GA 
GAAAATGAGT ATGGAGGATG AAAACTTTCG GAAGAGAAAG TTGCCTGTGG 
TAAGTTCAGT TGTTAAAGTA AAAAAATTCA ATCATGATGG AGAAGAGGAG 
GAAGfiAGATG ATGATTACGG GTCTCGAACA GGAAGCATCT CCAGCAGTGT 
GTCTGTGCCT GCAAAGCCTG AAAGGAGACC TTCTCTTCCA CCTTCTAAAC 
AAGCTAACAA GAATCTGATT TTGAAGGCTA TATCTGAAGC TCAAGAATCC 
GTAACAAAAA CAACTAACTA CTCTACAGTT CCACAGAAAC AGACACTTCC 
AGTTGCTCCC AGAACTCGAA CTTCTCAAGA AGAATTGCTA GCAGAMTGG 
TCCAGGGACA AAGTAGGACC CCCAGAATAA GTCCCCCCAT TAAAGAAGAG 
GAAACAAAAG GAGATTCTGT AGAAAAAAAT CAAGCTGAGA TGAGTGAACT 
GAGTGTGGCA CAGAAACCAG AAAAACTTTT GGAGCGCTGC AAGTACTGGC 
CTGCTTGTAA AAATGGGGAT GAGTGTGCCT ACCATCACCC CATCTCACCC 
TGCAAAGCCT TCCCCAATTG TAAATTTGCT GAAAAATGTT TGTTTGTTCA 
CCCAAATTGT AAATATGATG CAAAGTGTAC TAAACCAGAT TGTCCCTTCA 
CTCATGTGAG TAGAAGAATT CCAGTACTGT CTCCAAAACC AGTTGCACCA 
CCAGCACCAC CTTCCAGTAG TCAGCTCTGC CGTTACTTCC CTGCTTGTAA 
GAAGATGGAA TGTCCCTTCT ATCATCCAAA ACATTGTAGG TTTAACACTC 
AATGTACAAG TCCGGACTGC ACATTCTACC ATCCCACCAT TAATGTCCCA 
CCACGACATG CCTTGAAATG GATTCGACCT CAAACCAGCG AATAGCACCC 
AGTCCTGCCT GGCAGAAGAT CATGCAGTTT GGAAGTTTTC ATGTACTGAT 
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Figure 25(B) 



GAAAGATACT CTACAGAACT TGTCAAATCT TTGAAACTTG GAATATATTG 
CTTTCATAAT ATGAAGTTTT ATTGCCTATC TATCTGAAGT GTCTAATTTT 
TCAAGTTTGT AAGTTTATTA TGTGGTTTTA ACATTGGGTG TTTTTGTTTT 
GTTTTTACTA TGAAAAGACA GCTTAAGGAA GAGCTAAATT CTGTTAAAAT 
ATTTGGGGCA TGTTTGTGCA CTGCTGTTGT GAGGATCAGC ATATGAAATT 
GACATCATGG TTAGTCATGG TACTGCAGCT TAGGGGGCTA CACGGTTGCT 
GTGTGAGTGG AGAGATGCAG TGAGGCAGTT GTCATTATTC TAAAAATTGT 
ACTACTTTCA CTTTTCCCAA AGATTATATA ATGTTCATAA TCCACCATGA 
AAACAGCATT GGCCAAAGGT ACTGAGGCTG CTTAAAATAT TCAATTCTGC 
TTTTTAATTT TTAAGTGAAT TTAGTTTGAA AAGCATGATT ATACAGGCCT 
CTCAGGCTGA GTGCTACTTT CGGTAAAGTT CCAGTTTTCC TGCCTTCTGT 
GACAGGATGA ATGAGGTGGG TATGGACAGT GGAGGCAGCT GGAATGGCAA 
GTGCAGAAAA TAGGAACAGT TCTATACAGT GCTCTCATTT ACTAATAACA 
TAATGCCTTC TAAATAATTT TTTTGGGAAA CTACATTATC ACAAAATTAT 
ACAAATTTTT TTACAAGTAT TTACATACTG TATCTGAATU^ CAGACTTTAA 
AGTCACAAGA TTATAAATGT ACATATGTAT TCTCACATTC TGAAAAATAA 
CATTCTCAGA ATCCACAGAA AATATACTTA GTTACTACTG AA6ATAATTT 
TTGAAATGTA AAAATTAGAT TTAAATAGTA TATTTTAAAT GACAGAACTA 
TAATTACAGA GATCAGATCA GATAGGTAAA CTGCAAGATA GATAGGATGA 
AACTTTTGGC CTACTGTATT ACTTACAGAG TTTTTTTGTG TGTGGTTTTT 
AAAACTGTTA AGGCT^GAAG TGTCAAATGC TTTAGAGTTA AATAACAGAT 
CACTGATTTC AAAGACTTGG TGTATAGTGT TAAAAATTAA AGCTTAAAAG 
GTGGTTAGAA AAGTGGATTA ATGCAAAAGG GGTAATAAAG ACTGCAACAT 
TCTCAGGACC AAATTAAACT GCTAA 



wo 97/38085 



PCT/US97/05930 



Figure 26 



•NSAAAEF^Ii RGRSPRQPPG KPSAAQC«VP ARRRAMEIGT ETSRKIRSAI 
KGKliQELGAY VDEELPDYIM VMVANKKSQD QMTEDLSLFL GNNTIRFTVW 
LHGVLDKLRS VTTEPSSIiKS SDTNIFDSNV PSNKNNFSRG DERRHEAAVP 
PLAIPSARPE KRDSRVSTSS QESKTTNVRQ TYDDGAATRL MSTVKPLREP 
APSEDVIDIK PEPDDLIDED LNFVQENPLS QKEPTVTLTY GSSRPSIEIY 
RPPASRNADS GVHLNRLQFQ QQQNSIHAAK QLDMQSSWVY ETGRLCEPEV 
LNSLEETYSP FFRNNSEKMS MEDENFRKRK LPWSSWKV KKFNHDGEEE 
E(3DDDYGSRT GSISSSVSVP AKPERRPSLP PSKQANKNLI LKAISEAQES 
VTKTTNYSTV PQKQTLPVAP RTRTSQEELL AEWQGQSRT PRISPPIKEE 
ETKGDSVEKN QAEMSELSVA QKPEKLLERC KYWPACKNGD ECAYHHPISP 
CKAFPNCKFA EKCLFVHPNC KYDAKCTKPD CPFTHVSRRI PVLSPKPVAP 
PAPPSSSQIiC RYFPACKKME CPPYHPKHCR FNTQCTSPDC TFYHPTINVP 
PRHAIiKWIRP QTSE»HPVLP GRRSCSLEVF MY»*KILYRT CQIFETWNIL 
LS^YEVLLPI YLKCLIFQVC KFIMWF«HWV FLFCFYYEKT A«GRAKFC*N 
IWGMFVHCCC EDQHMKLTSW LVMVLQLRGL HGCCVSGEMQ •GSCHYSKNC 
TTFTFPKDYI MFIIHHENSI GQRY*GCLKY SILLFNF»VN LV«KA*LYRP 
LRLSATFGKV PVFLPSVTG* MRWVWTVEAA GMASAENRNS SIQCSHLLIT 
• CLIiNNFFGK LHYHKIIQIF IiQVFTYCI«K QTLKSQDYKC TYVFSHSEK^ 
HSQNPQKIYL VTTEDNF«NV KIRFK^YILN DRTIITEIRS DR«TAR*IG* 
NFWPTVLLTE FFCVWFLKLL RQEVSNALEL NNRSLISKTW CIVLKIKA^K 
WRKVD^CKR GNKDCNILRT KLNC^' 



MEIGT ETSRKIRSAI 

KGKLQELGAY VDEELPDYIM VMVANKKSQD QMTEDLSLFL GNNTIRFTVW 
LHGVLDKLRS VTTEPSSLKS SDTNIFDSNV PSNKNNFSRG DERRHEAAVP 
PLAIPSARPE KRDSRVSTSS QESKTTNVRQ TYDDGAATRL MSTVKPLREP 
APSEDVIDIK PEPDDLIDED LNFVQENPLS QKEPTVTLTY GSSRPSIEIY 
RPPASRNADS GVHLNRLQFQ QQQNSIHAAK QLDMQSSWVY ETGRLCEPEV 
LNSLEETYSP FFRNNSEKMS MEDENFRKRK LPWSSWKV KKFNHDGEEE 
EGDDDYGSRT GSISSSVSVP AKPERRPSLP PSKQANKNLI LKAISEAQES 
VTKTTNYSTV PQKQTLPVAP RTRTSQEELL AEWQGQSRT PRISPPIKEE 
ETKGDSVEKN QAEMSELSVA QKPEKLLERC KYWPACKNGD ECAYHHPISP 
CKAFPNCKFA EKCLFVHPNC KYDAKCTKPD CPFTHVSRRI PVLSPKPVAP 
PAPPSSSQLC RYFPACKKME CPFYHPKHCR FNTQCTSPDC TFYHPTINVP 
PRHALKWIRP QTSE 
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